Elastic Agent not retrieving kube-state-metrics

I am attempting to set up an Elastic stack with ECK on k3s and getting it to monitor the k3s cluster itself using kube-state-metrics. I installed the ECK operator using the official helm chart, same for the kube-state-metrics chart (both are the latest version) and then apply the manifest below to bring up the elastic cluster. This all works fine; I can log in to Kibana and see that all the fleet managed agents are healthy and enrolled. I then modify the kubernetes integration for the agents to enable the kube-state-metrics polling and adjust the URL to that of the kube-state-metrics service and apply it. This change then gets rolled out to the agents (I see the agent adopting the updated policy in the pod logs).

However, no data appears in Elasticsearch or Kibana and no errors appear in the pod logs. I also see absolutely no logs for the agent itself in the Fleet Management section of Kibana (which itself feels a bit odd).

The problem is, I have no idea how to debug this and find out what is going wrong (since no errors appear in the agent pod logs), can anyone give me any tips here?

apiVersion: v1
kind: Namespace
metadata:
  name: elastic
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server
  namespace: elastic
spec:
  version: 8.1.0
  kibanaRef:
    name: kibana
  elasticsearchRefs:
    - name: elasticsearch
  mode: fleet
  fleetServerEnabled: true
  deployment:
    replicas: 1
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent
  namespace: elastic
spec:
  version: 8.1.0
  kibanaRef:
    name: kibana
  fleetServerRef:
    name: fleet-server
  mode: fleet
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        hostNetwork: true
        dnsPolicy: ClusterFirstWithHostNet
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  namespace: elastic
spec:
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  version: 8.1.0
  count: 1
  elasticsearchRef:
    name: elasticsearch
  config:
    xpack.fleet.agents.elasticsearch.hosts:
      ["https://elasticsearch-es-http.elastic.svc:9200"]
    xpack.fleet.agents.fleet_server.hosts:
      ["https://fleet-server-agent-http.elastic.svc:8220"]
    xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
      - name: kubernetes
        # pinning this version as the next one introduced a kube-proxy host setting default that breaks this recipe,
        # see https://github.com/elastic/integrations/pull/1565 for more details
        version: 0.14.0
    xpack.fleet.agentPolicies:
      - name: Fleet Server on ECK policy
        id: eck-fleet-server
        is_default_fleet_server: true
        namespace: elastic
        monitoring_enabled:
          - logs
          - metrics
        package_policies:
          - name: fleet_server-1
            id: fleet_server-1
            package:
              name: fleet_server
      - name: Elastic Agent on ECK policy
        id: eck-agent
        namespace: elastic
        monitoring_enabled:
          - logs
          - metrics
        unenroll_timeout: 900
        is_default: true
        package_policies:
          - name: system-1
            id: system-1
            package:
              name: system
          - package:
              name: kubernetes
            name: kubernetes-1
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana-ingress
  namespace: elastic
spec:
  ingressClassName: nginx
  rules:
    - host: kibana.n12.eu
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: kibana-kb-http
                port:
                  number: 5601
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: elastic
spec:
  version: 8.1.0
  nodeSets:
    - name: elastic
      count: 5
      config:
        node.roles: ["master", "data", "ingest", "ml"]
        node.store.allow_mmap: false
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              resources:
                limits:
                  memory: 8Gi
                  cpu: 4
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms4g -Xmx4g"
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
            storageClassName: synology-iscsi-storage
  http:
    tls:
      selfSignedCertificate:
        disabled: true
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: elasticsearch-ingress
  namespace: elastic
spec:
  ingressClassName: nginx
  rules:
    - host: elasticsearch.n12.eu
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: elasticsearch-es-http
                port:
                  number: 9200
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: elastic
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

Ok, so I figured out what was going wrong here. Because Elasticsearch was being deployed without the self-signed certificates, the agents couldn't send their metrics to the Elasticsearch instance and just failed silently (note to dev team, maybe add log messages notifying of the failure to the pod logs by default?). The trick with this config was to set the Elasticsearch URL to http instead of https and keep the Fleet Server URL on https.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.