Health Checks
Health Over Time Graph
On the top of the Health Checks page is a progression of the health scores over time in 1, 5, 30, or 60 minute intervals. The Health Over Time graph is a live tracker of the health tests Sosivio is continuously running for each of the layers.

The Health Checks page is where you can view the history of your cluster's health. Here you can see tests results related to each category of the cluster:
- Platform: Tests related to the Kubernetes components and infrastructure
- Application: Tests related to the behavior of applications on the cluster
- Deployment: Tests related to the configuration of various components and how applications are being deployed on the cluster

Platform Health Checks
-
Node Ready
- Pass: The Node is operational
- Fail: The Node is not operational
- Impact: The Node is not in a Ready state and will not allow Pods to be deployed on it
-
Node Disk Space
- Pass: The Node disk has adequate space available to deploy Pods
- Fail: The Node disk space is insufficient to deploy Pods
- Impact: The Node disk space is scarce. Pod eviction may occur on this Node and new Pods may not be deployed on it
-
Node Load Average
- Pass: The Node load average is OK
- Fail: The Node load average is high
- Impact: The Node load average is dangerously high. Processes running on this Node might experience resource induced latency
-
Node DNS Latency
- Pass: The Node DNS services latency is OK
- Fail: The Node DNS services latency is high
- Impact: The DNS query latency on the Node is high. This can cause slow responses between other Pods and Services
-
Node SDN Latency
- Pass: The Node SDN latency is OK
- Fail: The Node SDN latency is high
- Impact: The SDN latency on the Node is high. Pods and processes running on this Node may experience slow response times. In addition, the Node may be in an oscillating state (Ready/NotReady)
-
Kubernetes API Server Stability
- Pass: The Kubernetes API server Pod is stable
- Fail: The Kubernetes API server Pod is restarting
- Impact: The Kubernetes API server is continuously restarting. The API requests may fail or intermittently timeout
-
API Server Running
- Pass: The Kubernetes API server is running
- Fail: The Kubernetes API server is not running
- Impact: The Kubernetes API server is not running. Kubernetes operations will be disrupted and time out
-
Kubernetes Controller Manager Running
- Pass: The Kubernetes controller manager is running
- Fail: The Kubernetes controller manager is not running
- Impact: The Kubernetes controller manager is not running. New Deployments, Pod restarts, and other various automatic operations may fail
-
Controller Manager Stability
- Pass: The Kubernetes controller manager is stable
- Fail: The Kubernetes controller manager is restarting
- Impact: The Kubernetes controller manager is continuously restarting. Pod Deployments may experience latency
-
Master Node Ready
- Pass: The Kubernetes master is ready
- Fail: The Kubernetes master is not ready
- Impact: The Kubernetes master is in a NOT READY state. Cluster operations may fail
-
Node CPU Throttle
- Pass: The Node CPU is stable
- Fail: The Node CPU is throttling
- Impact: The Node CPU is under heavy load, Processes running on this Node may run slower
-
Node Memory Consumption
- Pass: The Node memory consumption is OK
- Fail: The Node memory consumption is high
- Impact: The Node memory consumption is high. Processes may be killed and pods may be evicted
-
Scheduler Pod Running
- Pass: The Kubernetes scheduler Pod is running
- Fail: The Kubernetes scheduler Pod is not running
- Impact: New Pods and/or evicted Pods will not be deployed as normal
-
Kubernetes Scheduler Pod Stability
- Pass: The Kubernetes scheduler Pod is stable
- Negative: The Kubernetes scheduler Pod is not stable
- Impact: New Pods and/or evicted Pods will be infrequently deployed
Application Health Checks
-
Application Pod Stability
- Pass: The Pod is running
- Negative: The Pod is not running
-
Impact: The Pod is not running (NodeLost, Failed)
-
Specific Failure Scenarios:
- PodCrashing: The Pod is crashing
- Impact: The Pod could not reach a running state due to a failure
- PodPending: The Pod is in a Pending state
- Impact: The Pod is stuck in a Pending state
- PodTerminating more than 10 minutes: The Pod is in a terminating state
- Impact: The Pod is stuck in a terminating state
- PodContainerCreating more than 10 minutes: The Pod is stuck in a Container Creating state
- Impact: The Pod is stuck in a Container Creating state
- PodCrashing: The Pod is crashing
-
Deployment Pod Running
- Pass: The Deployment Pod is running OK
- Fail: The Deployment Pod is restarting
- Impact: The Pod is continuously restarting
Deployment Health Checks
-
Service Endpoint Pod(s) Present
- Pass: The service has at least one Pod available as an endpoint
- Negative: The service has no Pods available as endpoints
- Impact: The service has no endpoint Pods to connect to and therefore will not be available
-
Number of Endpoints
- Pass: The service has at least one Pod available as an endpoint
- Negative: The service has no Pods available as endpoints
- Impact: The service does not have all defined Replicas running as defined in the Pod Deployment
-
Application Deployment Status
- Pass: The Deployment Pods are all running and using the latest Pod version
- Negative: The Deployment Pods are not all running and/or are not using the latest Pod version
- Impact: Not all Deployment Pods are running and/or are not running the Pod version specified in the Deployment configuration
-
Latest Application Deployment Update
- Pass: The Application Deployments latest update was successful
- Negative: The Application Deployments latest update failed
- Impact: The pods for this Deployment are not updated with the latest Pod version
-
ReplicaSet Status
- Pass: The ReplicaSet Pods are all running and match the Pod version in the Deployment configuration
- Fail: The ReplicaSet Pods are not all running and do not match the Pod version in the Deployment configuration
- Impact: The Pods for this ReplicaSet are not updated with the latest Pod version in the Deployment configuration
-
StatefulSet Status
- Pass: The StatefulSet Pods are all running and match the Pod version in the Deployment configuration
- Fail: The StatefulSet Pods are not all running and do not match the Pod version in the Deployment configuration
- Impact: The Pods for this StatefulSet are not updated with the latest Pod version in the Deployment configuration