Hi all,
I'm in the middle of setting up an ECK for a Kubernetes cluster on Azure, and I've hit a small snag on Metricbeat. Hoping someone can push me in the right direction.
The setup
I've got Elastic and related deployments running in a dedicated namespace on two dedicated nodes.
Both Elastic and Kibana are up and running without error. The initial deployment was done using the Elastic Operator for Kubernetes, and is secured accordingly.
The problem
I've added Metricbeat to the mix. Metricbeat uses a Daemonset to collect metrics from the pods/nodes/etc and a Deployment to collect cluster state statistics. After a bit of trial-and-error on the security part, this is working fine for the output from the Daemons, but I am not receiving anything from the state-metrics Deployment.
The Metricbeat pod for the Deployment seems to come up without any error and seems to be collecting metrics.
2019-12-04T08:07:13.195Z INFO instance/beat.go:292 Setup Beat: metricbeat; Version: 7.4.2
2019-12-04T08:07:13.196Z INFO elasticsearch/client.go:170 Elasticsearch url: https://elastic-es-http:9200
2019-12-04T08:07:13.196Z INFO [publisher] pipeline/module.go:97 Beat name: metricbeat-54f645684f-xn646
2019-12-04T08:07:13.198Z INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2019-12-04T08:07:13.198Z INFO instance/beat.go:422 metricbeat start running.
2019-12-04T08:07:13.198Z INFO cfgfile/reload.go:171 Config reloader started
2019-12-04T08:07:13.199Z INFO cfgfile/reload.go:226 Loading of config files completed.
2019-12-04T08:07:43.200Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":27}},"total":{"ticks":100,"time":{"ms":110},"value":100},"user":{"ticks":80,"time":{"ms":83}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":8},"info":{"ephemeral_id":"126e8a4b-6dbd-459e-bfc1-93d1ecdb7b3d","uptime":{"ms":30263}},"memstats":{"gc_next":9569520,"memory_alloc":5648288,"memory_total":15310032,"rss":52699136},"runtime":{"goroutines":30}},"libbeat":{"config":{"module":{"running":0},"reloads":1},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":0}}},"system":{"cpu":{"cores":2},"load":{"1":0.43,"15":0.75,"5":0.58,"norm":{"1":0.215,"15":0.375,"5":0.29}}}}}}
I've restarted the kube-state-metrics pod on Kube-System just to be on the safe side. This too seems to start up without error:
I1203 16:21:25.679571 1 main.go:184] Testing communication with server
I1203 16:21:25.722699 1 main.go:189] Running with Kubernetes cluster version: v1.14. git version: v1.14.8. git tree state: clean. commit: 1da9875156ba0ad48e7d09a5d00e41489507f592. platform: linux/amd64
I1203 16:21:25.722726 1 main.go:191] Communication with server successful
I1203 16:21:25.722915 1 main.go:225] Starting metrics server: 0.0.0.0:8080
I1203 16:21:25.723261 1 main.go:200] Starting kube-state-metrics self metrics server: 0.0.0.0:8081
I1203 16:21:25.723348 1 metrics_handler.go:96] Autosharding disabled
I1203 16:21:25.724509 1 builder.go:144] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses
All the shards on Elastic are status green, and I'm not seeing any connection or index errors (or any errors, for that matter) in the Elastic logs.
In short, all seems fine, except there's no State_* data coming in. In fact, if I delete the metricbeat Daemonset, no data is coming into the Metricbeat index at all, so nothing seems to be coming from the Deployment pod.
Configurations
The configuration for the Metricbeat Deployment is mostly standard, with added SSL/auth.
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-deployment-config
namespace: elastic
labels:
k8s-app: metricbeat
data:
metricbeat.yml: |-
metricbeat.config.modules:
# Reload module configs as they change:
reload.enabled: false
processors:
- add_cloud_metadata:
- add_kubernetes_metadata:
in_cluster: true
setup.ilm.enabled: false
output.elasticsearch:
hosts: ['https://elastic-es-http:9200']
ssl.certificate_authorities: ["/usr/share/elastic/certs/ca.crt"]
ssl.certificate: '/usr/share/elastic/certs/tls.crt'
ssl.key: '/usr/share/elastic/certs/tls.key'
username: '{username}'
password: "{password}"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-deployment-modules
namespace: elastic
labels:
k8s-app: metricbeat
data:
# This module requires `kube-state-metrics` up and running under `kube-system` namespace
kubernetes.yml: |-
- module: kubernetes
labels.dedot: true
annotations.dedot: true
metricsets:
- state_node
- state_deployment
- state_replicaset
- state_pod
- state_container
- state_statefulset
# Uncomment this to get k8s events:
- event
period: 10s
hosts: ["kube-state-metrics.kube-system.svc.cluster.local:8080"]
add_metadata: true
in_cluster: true
enabled: true
---
# Deploy singleton instance in the whole cluster for some unique data sources, like kube-state-metrics
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: metricbeat
namespace: elastic
labels:
k8s-app: metricbeat
spec:
template:
metadata:
creationTimestamp: ~
labels:
k8s-app: metricbeat
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: agentpool
operator: In
values:
- elastic
containers:
- args:
- "-c"
- /etc/metricbeat.yml
- "-e"
image: "docker.elastic.co/beats/metricbeat-oss:7.4.2"
imagePullPolicy: IfNotPresent
name: metricbeat
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elastic/certs/
name: elastic-internal-http-certificates
readOnly: true
- mountPath: /etc/metricbeat.yml
name: config
readOnly: true
subPath: metricbeat.yml
- mountPath: /usr/share/metricbeat/modules.d
name: modules
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: metricbeat
serviceAccountName: metricbeat
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: restriction
operator: Equal
value: elastic
volumes:
- configMap:
defaultMode: 384
name: metricbeat-deployment-config
name: config
- name: elastic-internal-http-certificates
secret:
defaultMode: 420
optional: false
secretName: elastic-es-http-certs-internal
- configMap:
defaultMode: 384
name: metricbeat-deployment-modules
name: modules
I initially ran the Deployment in the Kube-system namespace and the Daemonset in the Elastic namespace. As that didn't work, I've moved everything to the Elastic namespace now, and am targetting the kube-state-metrics services in Kube-system using "kube-state-metrics.kube-system.svc.cluster.local:8080".
In both cases the effect was the same. No visible errors in any of the logs, but no State_* metrics either.
Does anyone have any ideas what could be going on and/or have any suggestions on what I can do to try and isolate the cause?
Kind regards,
Chris