Troubleshooting

If you need help please use chat widget on website (bottom right corner on this page) or email hello@kuber.host or slack kuberhost.slack.com

Pod stuck in Terminating state

At first we should find out why it happen:

$ kubectl describe pod <pod name>

...
Events:
  Type    Reason   Age              From            Message
  ----    ------   ----             ----            -------
  Normal  Pulling  6s (x2 over 8h)  kubelet, ebox2  pulling image "kuberhost/docker-lobsters"
  Normal  Pulled   4s (x2 over 8h)  kubelet, ebox2  Successfully pulled image "kuberhost/docker-lobsters"
  Normal  Created  4s (x2 over 8h)  kubelet, ebox2  Created container
  Normal  Started  4s (x2 over 8h)  kubelet, ebox2  Started container

We can try to delete pod again:

$ kubectl delete pod <pod name>

If nothing else then we can delete pod with --force flag:

$ kubectl delete pod <pod name> --force --grace-period 0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "<pod name>" force deleted

Pod Keep Restarting or with status CrashLoopBackOff

This is usually happens when process in container stops.

To see pod logs just run :

$ kubectl logs <pod name> --tail 200 --follow

If pod already restarted then we can check logs from previous run:

$ kubectl logs <pod name> --tail 200 --previous

Pod also can be stopped when it tries to consume more memory then allowed:

$ kubectl describe pod <pod name>
...
    State:          Terminated
      Reason:       OOMKilled
      Exit Code:    1
      Started:      Mon, 28 September 2018 04:47:12 +0000
      Finished:     Mon, 28 September 2018 04:47:12 +0000
    Last State:     Terminated

HTTPS Certificate is Invalid or Missing or Expired

One of the reasons can be when your resource quota if full and cert-manager can not create temporary pod to obtain certificate. To solved it we can temporary reduce resource usage of pods or scale deployment to 0 replicas and wait until kubernetes secret for TLS is updated.

Scale command:

$ kubectl scale deployment --replicas 0 myapp
< wait some time >
$ kubectl get secret
NAME                         TYPE                                  DATA      AGE
default-token-gbpb6          kubernetes.io/service-account-token   3         2h
myapp-dashboard-token-79nmj  kubernetes.io/service-account-token   3         2h
myapp-kuber-host             kubernetes.io/tls                     2         10s

Additional monitoring is always helpful, to check website availability and SSL status can use something like uptimerobot.com

Pods not Being Created

This usually happens when controlling object (Deployment, CronJob, StatefulSet, etc) missing some requirements. It can be exceed resource quota, missing secret, config map, persistent volume.

How to investigate:

$ kubectl get all
NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/mysql      ClusterIP   10.233.7.25   <none>        3306/TCP   2d9h

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql      1         0         0            0           33s  <-- this line matter

NAME                                 DESIRED   CURRENT   READY     AGE
replicaset.apps/mysql-84f4c75fb5     1         0         0         33s

$ kubectl describe replicaset mysql-84f4c75fb5
...
Events:
  Type     Reason        Age               From                   Message
  ----     ------        ----              ----                   -------
  Warning  FailedCreate  1m                replicaset-controller  Error creating: pods "mysql-55697d44f9-2wmrx" is forbidden: exceeded quota: paxa-quota, requested: limits.memory=850Mi, used: limits.memory=250Mi, limited: limits.memory=1000Mi
...

Also can use kubectl get event for recent events for all kubernetes objects.

By checking ReplicaSet we can see reason, need to adjust deployment resource spec and new pod will be created.