Microservices — now the de-facto choice for how we build our infrastructure — naturally paved the way for containers. With container orchestration tools like Kubernetes and Docker, organizations can now ship applications more quickly, at greater scale. But, with all that power and automation come challenges, especially around maintaining visibility into this ephemeral infrastructure.

Kubernetes architecture master work node diagram

Monitoring Kubernetes Workloads

Kubernetes is complex, (to find out exactly what Kubernetes is and how it works, read our complete guide on Kubernetes). To use it successfully, it requires that several components be monitored simultaneously. To make your strategy for monitoring easier, separate monitoring operations into several areas, with each section referring to an individual layer of the Kubernetes environment. Then break down the monitoring of workload from top-down: clusters, pods, applications, and finally, the end-user experience.

Monitoring Kubernetes Clusters

The cluster is the highest-level constituent of Kubernetes. Most Kubernetes installations have just one cluster. This is why when you monitor the cluster, you get a full view across all areas. And can easily ascertain the health of pods, nodes, and apps that make up the cluster.

When deploying multiple clusters using federation, each cluster must be monitored individually. The areas you will monitor at the cluster level would be:

  • Unsuccessful pods: Pods which fail and abort are a normal part of Kubernetes processes. When a pod that should be working at a more efficient level or is inactive, it’s critical to look into the reason behind the anomalies in pod failures.
  • Node load: Tracking the load on each node is integral to monitoring efficiency. Some nodes may have a lot more usage than others. Rebalancing the load distribution is key to keeping workloads fluid and effectual. This can be done via DaemsonSets.
  • Cluster usage: Monitoring cluster infrastructure allows you to adjust the number of nodes in use and dedicate resources to power workloads efficiently. You can see how resources are being distributed so you can scale up or down and avoid the costs of additional infrastructure. To that end, we recommend learning how to set a container’s memory and CPU usage limit.

Monitoring Kubernetes Pods

Cluster monitoring gives a macro view of the Kubernetes environment, but collecting data from individual pods is also essential. It reveals the health of individual pods and the workloads they are hosting. You get a clearer picture of pod performance at a granular level, beyond the cluster. Here you would monitor:

  • Total pod instances: There need to be enough instances of a pod to ensure high availability. This way hosting bandwidth is not wasted, and you do not run more pod instances than needed.
  • Pod deployment: Monitoring pods deployment allows you to see if any misconfigurations might be diminishing the availability of pods. It’s critical to keep an eye on how resources distribute to nodes.
  • Actual pod instances: Monitoring the number of instances for each pod is running versus what you expected to be running will reveal how to can redistribute resources to achieve the desired state in terms of pods instances. ReplicaSets could be misconfigured if you see varying metrics, so it’s important to analyze these regularly.

Monitoring Applications Running in Kubernetes

Applications are not a part of Kubernetes, but wanting to host an application is the whole point of using Kubernetes. That’s why monitoring the application that’s hosted on the cluster is integral for success. Issues that application monitoring reveals could be a problem with the Kubernetes environment, or in the application’s code.

By monitorings apps, you can identify the glitches and resolve them without delay. Start by monitoring:

  • Errors: If an error happens, you can get to it quickly when monitoring, and resolve it before it affects end-users.
  • Transaction traces: Transaction traces assist you in troubleshooting if apps experience availability or performance problems.
  • Application responsiveness: You can monitor how long it takes for an app to respond to a request. You can see if they can handle current workloads or if they are struggling to maintain performance.
  • Application availability: Monitor if apps are active and up, and efficiently responding.

Monitoring End-user Experience when Running Kubernetes

End-user experience, like applications, technically is not a part of the Kubernetes platform. The overall goal for an application is to give end-users a positive experience and should be a part of your Kubernetes monitoring strategy.

Collecting data will let you know how the app is performing, its responsiveness, and its usability. Doing real-user and synthetic monitoring is essential to understand how users interact with Kubernetes workloads. It lets you know if you need to make any adaptations or adjustments which will enhance usability and improve the frontend.

Monitoring Kubernetes in a Cloud Environment

When Kubernetes is running in the cloud, there are specific factors to consider when planning your monitoring strategy. In the cloud, you will also have to monitor:

  • IAM events: You will have to monitor for IAM activity. That includes permissions changes and logins, which is a best practice for security in a cloud-based installation or environment.
  • Cloud APIs: A cloud provider has its own APIs, and your Kubernetes installation uses it to request resources, so it needs to be monitored.
  • Costs: Costs on the cloud can quickly run-up. Cost monitoring assists you with budgeting and ensures that you do not overspend on cloud-based Kubernetes services.
  • Network performance: In a cloud-based installation, the network can become the largest hindrance to the performance of your applications. If you monitor the cloud network regularly, you can be sure that data is moving as rapidly as needed so that you can avoid network-related problems.

Monitoring Metrics in Kubernetes

To gain higher visibility into a Kubernetes installation outside of performing different types of monitoring for Kubernetes, there are also several metrics that will give you valuable insight into how your apps are running.

Common Metrics

These are metrics collected from Kubernetes’ code (written in Golang). It allows you to understand what’s going on at a cellular level in the platform.

Node Metrics

Metrics from operating systems enabling Kubernetes’ nodes can give you insight into the overall health of individual nodes. You can monitor memory consumption, filesystem activity, CPU load, usage, and network activity.

Kubelet Metrics

To make sure the Control Plane is communicating efficiently with each individual node that a Kubelet runs on, you should monitor the Kubelet agent regularly.

Kube-State-Metrics

You can get an elective Kubernetes add-on which generates metrics from the Kubernetes API called Kube-State-Metrics.

Controller Manager Metrics

To ensure that workloads are orchestrated effectively you can monitor the requests that the Controller is making to external APIs. This is critical in cloud-based Kubernetes deployments.

Scheduler Metrics

If you want to identify and prevent delays, you should monitor latency in the Scheduler. This way you can ensure Kubernetes is deploying pods smoothly and on time.

Etcd Metrics

Etcd stores all the configuration data for Kubernetes. Etcd metrics will give you essential visibility into the condition of your cluster.

Container Metrics

Looking specifically into individual containers will allow you to monitor exact resource consumption rather than more general Kubernetes metrics. CAdvisor allows you to analyzes resource usage happening inside containers.

API Server Metrics

APIs keeps the Kubernetes frontend together and so these metrics are vital for gaining visibility into the API Server, and thereby into the whole frontend.

Log Data

Logs are useful to examine when you find a problem revealed by metrics. They give you exact and invaluable information which provides more details than metrics. There are many options for logging in most of Kubernetes’ components. Applications also generate log data.

Kubernetes Monitoring Challenges, Solutions and Tips

Migrating applications from monolithic infrastructures to microservices managed by Kubernetes is a long and intensive process. It can be full of pitfalls and can prove to be error-prone. But to achieve higher availability, innovation, cost benefits, scalability, and agility, it’s the only way to grow your business, especially in the cloud. Visibility is the main issue when it comes to Kubernetes environments as seeing real-time interactions of each microservice is challenging, due to the complexity of the platform. Monitoring is a specialized skill each enterprise will need to practice and improve upon to be successful.

A Kubernetes cluster can be considered complex due to its multiple servers and integrated private and public cloud services. When an issue arises, there are many logs, data, and other factors to examine. Legacy monolithic environments only need a few log searches to ascertain the problem. Kubernetes environments, on the other hand, have one or several logs for the multiple microservices implicated in the issue you’re experiencing.

To address these challenges, we’ve put together the following recommendations for effectively monitoring containerized infrastructure.

Effective Use of the Sidecar Pattern for Improved Application Monitoring in Kubernetes

One key best practice is leveraging role-based access within Kubernetes to provide end-to-end control by a single team with their monitoring solution, and without having full control of the cluster. Leveraging a monitoring solution under a team namespace helps operators easily control monitoring for their microservice-based container application inside the scope of their team.

However, they can add additional monitoring support without having to rebuild their application container. A dynamic approach to monitoring improves observability and drives context — without having to pull containers down if they start to exhibit issues.

Namespace Observability

By leveraging an open source monitoring event pipeline, such as Sensu Go, operations teams can get a dedicated team view of containers to improve visibility into their applications and increase insight into possible anomalies. These types of solutions offer dynamic monitoring changes for ephemeral infrastructure. As a result, operators can help drive collaboration securely by using Kubernetes’ built-in concept for role-based access control.

Kubernetes provides namespace scoping for resources, making it possible to give individual teams full control of applications under their namespace. Operators can also create containers and pods in a Kubernetes namespace and map it directly to code-driven monitoring tools, leveraging the same namespace as well.

For example, you can have an ‘Associated’ namespace in open source monitoring event pipeline – similar to Kubernetes — so that one team can control containers and monitoring around it using a repository of declarative YAML config files. With RBAC (role-based access control), you can mitigate risk by providing only necessary access to a user so they don’t have more than is needed.

Codifying monitoring workflows into declarative configuration files allows you to monitor at the speed of automation. It can be shared, treated as code, reviewed, edited, and versioned, allowing for efficient multi-cloud operation. Read more on how to install Prometheus on Kubernetes and how to use it.

Best Practices for Logs in Kubernetes

Application log aggregation for containerized workloads is an essential best practice that can improve software development. Because of the ephemeral nature of containerized workloads, the number of log entries being generated in a cluster can be quite large.

Logging agents like Fluentd and FluentBit — cross-platform, open-source data collection software projects originally developed at Treasure Data — are typically used as DaemonSets to collect the logs for all pods running on a node, using a privileged volume mount of the log files stored by the container runtime. These are cluster-level tools used to aggregate logs into a data lake such as Elasticsearch or send them into a stream processor solution like Kafka — and you might want to use functional role-based monitoring to track these additional pieces of log aggregation infrastructure running outside of your Kubernetes cluster.

Use a Kubernetes Monitoring Solution

Visibility is essential for enterprises to identify container issues that impede application performance. You can monitor containerized applications running inside Kubernetes pods more efficiently and scale up or down depending on the need. This is why it is critical to have a comprehensive Kubernetes monitoring solution that will give you end-to-end visibility into each and every component of your applications. From pods, nodes, containers, infrastructure, Kubernetes platform, to each microservice and end-user device.

Monitor Kubernetes with APM

Implementing an application performance monitoring solution (APM) gives enterprises visibility into their applications and allows them to asses overall performance. It organizes and offers insights into Kubernetes clusters, Docker containers, and containerized applications. You can examine the infrastructure’s fundamental metrics, learn about potential impediments, and make adjustments.

Get instant visibility into, memory, CPU and network utilization, and resource usage statistics when deploying APM-monitored container applications. APM metrics quickly identify common issues such as bandwidth- monopolizing applications or recognize far-reaching container-level network errors.

With these tips and monitoring strategies, operators can take huge leaps forward to gain greater visibility into their container-based infrastructure. And embrace multi-cloud operation with confidence. Want help getting started? Contact one of our experts today.

Article written in collaboration with Jef Spaleta, Principal Developer Advocate at Sensu.