Observability on K8s — DataDog Autodiscovery and DogStatsD

Published in

Geek Culture

7 min readMay 13, 2021

In the previous article, I have introduced about DataDog and how to deploy it to the Kubernetes cluster. Today I will share about the Autodiscovery feature in DataDog, and how we can send custom metrics to DogStastD socket.

Requirements

Basic knowledge about K8s, Docker, and DataDog.
Administrator permission on a running AWS EKS cluster. You can visit https://eksworkshop.com to learn how to set up.
kubectl to manage K8s cluster (https://kubernetes.io/docs/tasks/tools/install-kubectl/).
DataDog agents was deployed to K8s cluster (check out this guideline).

DataDog Autodiscovery?

According to DataDog official document:

When you are monitoring a containerized infrastructure, one challenge that arises is that containers can shift from host to host. The dynamic nature of containerized systems makes them difficult to manually monitor.
To solve this issue, you can use Datadog’s Autodiscovery feature to automatically identify the services running on a specific container and gather data from those services. Whenever a container starts, the Datadog Agent identifies which services are running on this new container, looks for the corresponding monitoring configuration, and starts to collect metrics.

DataDog’s Autodiscovery helps:

Automatically identify the services running on a specific container and gather detailed metrics from those services — wherever they may be running.
Continuously monitor all your services, no matter how dynamic or ephemeral the underlying infrastructure may be.
Let you define configuration templates for the Agent checks and specify which containers each checks should apply to.
Continuously watch for Docker events, like container creation/destruction/starts/stops, then enable/disable/regenerate static check configurations on such events, and report collected metrics.

Setup Autodiscovery

In Kubernetes, DataDog Agent pods (daemonSet) enables Autodiscovery by default, we just need to verify this following environment variable is set:

➜  ~ kubectl --namespace monitoring exec -it datadog-ja4fx -- env | grep KUBERNETES
...
KUBERNETES=yes
...

Once the feature enabled, the DataDog Agents will automatically attempts Autodiscovery for a number of services, based on default Autodiscovery configuration files.

An integration template can be defined in multiple forms: as Kubernetes pod annotations, Docker labels, a configuration file mounted within the Agent, a ConfigMap, and key-value stores. Example:

apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_IDENTIFIER>.check_names: '[<INTEGRATION_NAME>]'
    ad.datadoghq.com/<CONTAINER_IDENTIFIER>.init_configs: '[<INIT_CONFIG>]'
    ad.datadoghq.com/<CONTAINER_IDENTIFIER>.instances: '[<INSTANCE_CONFIG>]'
    # (...)
spec:
  containers:
    - name: '<CONTAINER_IDENTIFIER>'
# (...)

As explain:

<CONTAINER_IDENTIFIER> is name of the container where the metrics expose.
<INTEGRATION_NAME> is name of the DataDog integration, e.g: apache, redis, or openmetrics, etc.
<INIT_CONFIG> is the configuration parameters listed under init_config: in your conf.yaml and required for any integrations you’re enabling.
<INSTANCE_CONFIG> is part of the <INIT_CONFIG>, these are the configuration parameters listed under instances: in your conf.yaml and required for any integrations you’re enabling.

For other integration template types, can refer to Kubernetes Integrations Autodiscovery.

To see how DataDog integrations look like, please refer to its integrations-core GitHub.

Sending metrics using DataDog’s Autodiscovery

In this example, I will show how to configure cluster-autoscaler (CA) to send metrics to DataDog using Autodiscovery feature:

Cluster Autoscaler (CA) is exposing its metrics via a metrics endpoint in OpenMetrics format. We will use DataDog OpenMetrics integration to let the Agents collect metrics from that endpoint, and forward to DataDog.

Follow this instruction to prepare requisite configurations for CA.

Then append these configuration block into the values.yaml of your deployment:

...
podAnnotations:
  ad.datadoghq.com/aws-cluster-autoscaler.check_names: '["openmetrics"]'
  ad.datadoghq.com/aws-cluster-autoscaler.init_configs: '[{}]'
  ad.datadoghq.com/aws-cluster-autoscaler.instances: |
    [
      {
        "prometheus_url": "http://%%host%%:8085/metrics",
        "namespace": "kubernetes",
        "metrics": ["*"],
        "ignore_metrics":
          [
            "go_*"
          ]
      }
    ]
...
serviceMonitor:
  enabled: true
...

In explain above configuration, the podAnnotations will be rendered to conf.yaml (sample file):

<CONTAINER_IDENTIFIER> is aws-cluster-autoscaler.
<INTEGRATION_NAME> is openmetrics.
<INIT_CONFIG> is [{}], which mean null.
<INSTANCE_CONFIG> has those configurations:

"prometheus_url": "http://%%host%%:8085/metrics": this is the metrics endpoint of CA, with %%host%% template variable will help auto-detect network and return the IP address of the CA pod. Please check out Autodiscovery Template Variables for full list of supported variables.
"namespace": "kubernetes" will append all the metrics with the prefix kubernetes.*.
"metrics": ["*"] and "ignore_metrics": ["go_*"] which means collect all the supported metrics of CA, except go_*.

5. serviceMonitor.enabled: true: creates a Prometheus Operator ServiceMonitor and expose /metrics endpoint for CA.

Run helm install to deploy the chart into your cluster:

➜  ~ helm repo add autoscaler https://kubernetes.github.io/autoscaler
➜  ~ helm install autoscaler --namespace monitoring -f autoscaler-values.yaml autoscaler/cluster-autoscaler

Verify the deployment:

➜  ~ kubectl --namespace monitoring get pods -l app.kubernetes.io/instance=autoscaler -o wide --no-headers
autoscaler-aws-cluster-autoscaler-7bre6732d6-czpkf   1/1   Running   1     30s   10.123.45.67   ip-10-123-45-144.ap-southeast-1.compute.internal   <none>   <none>

We can see CA is deployed and pod is running in node ip-10-123-45-144.ap-southeast-1.compute.internal.

I will explain how Kubernetes Integration Autodiscovery works in this example:

As the OpenMetrics check has been packaged with the DataDog Agent starting version 6.6.0, the Agent pods will automatically search all the pod annotations for that integration template.
In above CA Helm’s values.yaml, we have defined its podAnnotations with openmetrics integration’s configuration, so the Agent now can start checking and collecting the metrics from CA.
First, we need to find out the DataDog Agent pod which running in same node with CA pod:

➜  ~ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=ip-10-123-45-144.ap-southeast-1.compute.internal | grep -E "autoscaler|datadog"
monitoring     autoscaler-aws-cluster-autoscaler-7bre6732d6-czpkf   1/1     Running     1     120s     10.123.45.67   ip-10-123-45-144.ap-southeast-1.compute.internal   <none>     <none>
monitoring     datadog-ja4fx                                        1/1     Running     0     1d       10.123.45.78   ip-10-123-45-144.ap-southeast-1.compute.internal   <none>     <none>

Verify the integration in the Agent pod, in this case is datadog-ja4fx.

➜  ~ kubectl --namespace monitoring exec -it datadog-ja4fx -- agent check openmetrics
...
=== Series ===
{
  "series": [
    {
      "metric": "kubernetes.cluster_autoscaler_function_duration_seconds.count",
      "points": [
        [
          1620938471,
          6001
        ]
      ],
      "tags": [
        "docker_image:k8s.gcr.io/autoscaling/cluster-autoscaler:v1.17.4",
        "function:main",
        "image_name:k8s.gcr.io/autoscaling/cluster-autoscaler",
        "image_tag:v1.17.4",
        "kube_app:aws-cluster-autoscaler",
        "kube_app_instance:autoscaler",
        "kube_app_name:aws-cluster-autoscaler",
        "kube_container_name:aws-cluster-autoscaler",
        "kube_deployment:autoscaler-aws-cluster-autoscaler",
        "kube_namespace:monitorin",
        "kube_replica_set:autoscaler-aws-cluster-autoscaler-7bre6732d6",
        "kube_service:autoscaler-aws-cluster-autoscaler",
        "pod_name:autoscaler-aws-cluster-autoscaler-7bre6732d6-czpkf",
        "pod_phase:running",
        "short_image:cluster-autoscaler",
        "upper_bound:15.0"
      ],
      "host": "i-01234abcdxyz",
      "type": "gauge",
      "interval": 0,
      "source_type_name": "System"
    },
...
=== Service Checks ===
[
  {
    "check": "kubernetes.prometheus.health",
    "host_name": "i-01234abcdxyz",
    "timestamp": 1620938470,
    "status": 0,
    "message": "",
    "tags": [
      "docker_image:k8s.gcr.io/autoscaling/cluster-autoscaler:v1.17.4",
      "endpoint:http://10.123.45.67:8085/metrics",
      "image_name:k8s.gcr.io/autoscaling/cluster-autoscaler",
      "image_tag:v1.17.4",
      "kube_app:aws-cluster-autoscaler",
      "kube_app_instance:autoscaler",
      "kube_app_name:aws-cluster-autoscaler",
      "kube_container_name:aws-cluster-autoscaler",
      "kube_deployment:autoscaler-aws-cluster-autoscaler",
      "kube_namespace:addons",
      "kube_replica_set:autoscaler-aws-cluster-autoscaler-7bre6732d6",
      "kube_service:autoscaler-aws-cluster-autoscaler",
      "pod_name:autoscaler-aws-cluster-autoscaler-7bre6732d6-czpkf",
      "pod_phase:running",
      "short_image:cluster-autoscaler"
    ]
  }
]
...
=========
Collector
=========
  Running Checks
  ==============
    openmetrics (1.10.0)
    --------------------
      Instance ID: openmetrics:kubernetes:9c6a<>cac [OK]
      Configuration Source: kubelet:docker://9782a<>20dc088
      Total Runs: 1
      Metric Samples: Last Run: 641, Total: 641
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 1
      Average Execution Time : 305ms
      Last Execution Date : 2021-05-13 20:41:11.000000 UTC
      Last Successful Execution Date : 2021-05-13 20:41:11.000000 UTC

Nice, we can see DataDog Agent can detect and collect the metrics. Let’s see if the metrics are sending to DataDog by checking Metrics => Summary:

DogStatsD

What is custom metrics?

If a metric is not submitted from one of the 450+ Datadog integrations, it’s considered a custom metric.
Custom metrics help you track your application KPIs: number of visitors, average customer basket size, request latency, or performance distribution for a custom algorithm.
A custom metric is identified by a unique combination of a metric’s name and tag values (including the host tag).

In general, any metrics you send using DogStatsD or through a custom Agent Check is a custom metric.

DogStatsD is a metrics aggregation service bundled with the Datadog Agent. It implements the StatsD protocol and adds a few Datadog-specific extensions:

Histogram metric type
Service checks
Events
Tagging

How DogStatsD works?

DogStatsD accepts custom metrics, events, and service checks over UDP and periodically aggregates and forwards them to Datadog.
Because it uses UDP, your application can send metrics to DogStatsD and resume its work without waiting for a response. If DogStatsD ever becomes unavailable, your application won’t experience an interruption.
By default, DogStatsD listens on UDP port 8125, but you can also configure DogStatsD to use a Unix domain socket.

Setup DogStatsD

In my previous blog, the datadog-k8s-values.yaml is already including DogStatsD configuration. I’m using Chart version 1.39.9, so the configuration looks like:

...
daemonset:
  useHostPort: true
...
datadog:
...
  useDogStatsDSocketVolume: True
  dogStatsDSocketPath: "/var/run/datadog/dsd.socket"
# This to expose UDP socket for some application not support UNIX socket
  nonLocalTraffic: True
...

If you’re using Chart version 2.x.x, can refer to this guide to enable DogStatsD.

Sending metrics using DogStatsD socket

Applications or services can send their metrics to DogStatsD using either UDP or UNIX socket:

UDP socket: Add this variable to your application Kubernetes deployment:

env:
- name: DD_AGENT_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

UNIX socket: Mount DogStatsD socket into your application Kubernetes deployment:

volumeMounts:
  - name: dsdsocket
    mountPath: /var/run/datadog
    readOnly: true
...
volumes:
- hostPath:
    path: /var/run/datadog/
  name: dsdsocket

Configure the StatD config of your application points to /var/run/datadog/dsd.socket or $DD_AGENT_HOST with port 8125.

I have written a small deployment to test sending the custom metrics to DogStatsD using netcat:

Apply the deployment into your cluster:

➜  ~ kubectl apply -f dogstatsd-demo.yaml
configmap/custom-metrics-script created
deployment.apps/dogstatsd-demo created

Wait for the pod running and verify in DataDog Metrics Explorer:

Delete the deployment to avoid unexpected custom metrics billing:

➜  ~ kubectl delete -f dogstatsd-demo.yaml
configmap/custom-metrics-script deleted
deployment.apps/dogstatsd-demo deleted

What’s Next

This blog is based on my own experience working with DataDog on existing EKS setup. It may or may not work on your existing environment. You can visit DataDog official documentation for more information. Feel free to leave comments or questions.

Hope you enjoy reading my blog. In the next one, I will share how to send Spark on K8s metrics to DataDog using Autodiscovery:

Observability on K8s — DataDog Autodiscovery and DogStatsD

Requirements

DataDog Autodiscovery?

Setup Autodiscovery

Sending metrics using DataDog’s Autodiscovery

DogStatsD

Setup DogStatsD

Sending metrics using DogStatsD socket

What’s Next

Spark on K8s — Send Spark job’s metrics to DataDog using Autodiscovery

James (Anh-Tú) Nguyễn on about.me

I am a Cloud Infrastructure Engineer in Singapore. Read my blog.

Written by Anh Tu (James) Nguyen