Installing elastic agent using K8S is not normal

L1NG · June 28, 2023, 10:01am

1.Install the elastic agent using K8S and check if the pod is running. The result is that it is running
2.But it's not normal to see it in Kibana
3.View detailed proxy information, which shows that there is an issue with my integration

4.But my integration has already been fully enabled，how can I solve this situation？

ferullo · June 29, 2023, 8:18pm

Hi @L1NG

Unfortunately I can't read your screenshot.

Can you find the latest policy response document from the affected Endpoint and share the entries from the Endpoint.policy.applied.actions list where the status field is not success? You can DM me the entire document if it's not clear what portions of it I'm asking for.

Endpoint's policy response document can be found in the metrics-* index pattern and will match the KQL query event.dataset : endpoint.policy . If you need help finding it in your Kibana let me know and I'll share instructions on how to do that.

L1NG · June 30, 2023, 2:06am

Hi thank you for your reply

As you said, I searched for event. dataset: endpoint. policy，It looks normal

event.dataset
endpoint.policy
@timestamp
Jun 30, 2023 @ 09:31:04.096
agent.build.original
version: 8.8.1, compiled: Sat Jun 3 01:00:00 2023, branch: 8.8, commit: a4a4ae26ff6a2e57b107baa1c8d49f43fe1cd77c
agent.id
d01d58e2-d08b-4e15-ab78-0aba22513baa
agent.type
endpoint
agent.version
8.8.1
data_stream.dataset
endpoint.policy
data_stream.namespace
default
data_stream.type
metrics
ecs.version
1.11.0
elastic.agent.id
d01d58e2-d08b-4e15-ab78-0aba22513baa
Endpoint.configuration.isolation
false
Endpoint.policy.applied.actions
[ { "name": [ "configure_memory_threat" ], "message": [ "Successfully enabled memory threat prevention with memory scanning enabled" ], "status": [ "success" ] }, { "name": [ "configure_diagnostic_memory_threat" ], "message": [ "Successfully enabled memory threat detection with memory scanning enabled" ], "status": [ "success" ] }, { "name": [ "configure_host_isolation" ], "message": [ "Host isolation is not supported" ], "status": [ "unsupported" ] }, { "name": [ "configure_malicious_behavior" ], "message": [ "Enabled 14 out of 14 malicious behavior rules" ], "status": [ "success" ] }, { "name": [ "configure_diagnostic_malicious_behavior" ], "message": [ "Enabled 26 out of 26 diagnostic malicious behavior rules" ], "status": [ "success" ] }, { "name": [ "configure_user_notification" ], "message": [ "Successfully configured user notification" ], "status": [ "success" ] }, { "name": [ "configure_malware" ], "message": [ "Successfully enabled malware prevention" ], "status": [ "success" ] }, { "name": [ "configure_diagnostic_malware" ], "message": [ "Successfully enabled malware detection" ], "status": [ "success" ] }, { "name": [ "configure_output" ], "message": [ "Successfully configured output connection" ], "status": [ "success" ] }, { "name": [ "configure_logging" ], "message": [ "Successfully configured logging" ], "status": [ "success" ] }, { "name": [ "load_config" ], "message": [ "Successfully parsed configuration" ], "status": [ "success" ] }, { "name": [ "download_user_artifacts" ], "message": [ "Successfully downloaded user artifacts" ], "status": [ "success" ] }, { "name": [ "download_global_artifacts" ], "message": [ "Global artifacts are available for use" ], "status": [ "success" ] }, { "name": [ "detect_process_events" ], "message": [ "Success enabling process events; current state is enabled" ], "status": [ "success" ] }, { "name": [ "detect_network_events" ], "message": [ "Success enabling network events; current state is enabled" ], "status": [ "success" ] }, { "name": [ "detect_file_write_events" ], "message": [ "Success enabling file events; current state is enabled" ], "status": [ "success" ] }, { "name": [ "configure_file_events" ], "message": [ "Success enabling file events; current state is enabled" ], "status": [ "success" ] }, { "name": [ "configure_network_events" ], "message": [ "Success enabling network events; current state is enabled" ], "status": [ "success" ] }, { "name": [ "configure_process_events" ], "message": [ "Success enabling process events; current state is enabled" ], "status": [ "success" ] }, { "name": [ "configure_response_actions" ], "message": [ "Successfully configured fleet API for response actions" ], "status": [ "success" ] }, { "name": [ "agent_connectivity" ], "message": [ "Successfully connected to Agent" ], "status": [ "success" ] }, { "name": [ "workflow" ], "message": [ "Successfully executed all workflows" ], "status": [ "success" ] } ]
Endpoint.policy.applied.artifacts.global.identifiers
[ { "sha256": [ "f61fe1822773e96148d7ce0e92c2dade015ab712df1238f70a2fa5865abdddd6" ], "name": [ "diagnostic-configuration-v1" ] }, { "sha256": [ "39fecb66f9337eb33f5c0359f51ad37761ff13e4a7c4be390e03d2c227ac7cf6" ], "name": [ "diagnostic-endpointelf-v1-blocklist" ] }, { "sha256": [ "e3eb12da99e044ecc7d50cea407bf17f33c546e5309aa7ee661234baed2b7750" ], "name": [ "diagnostic-endpointelf-v1-exceptionlist" ] }, { "sha256": [ "885020b5bb99b3b875f51678efae67874bae37bfcc0036ad86bd2f7cbf767824" ], "name": [ "diagnostic-endpointelf-v1-model" ] }, { "sha256": [ "e3c6d2e3dc54a965baa006d70fa65038f4efdd70c46fd44d833601a55b3f86c4" ], "name": [ "diagnostic-malware-signature-v1-linux" ] }, { "sha256": [ "446787594b72b874c5702fb32bb69ffb34699df620e3d5bb8e213776379a4b3e" ], "name": [ "diagnostic-rules-linux-v1" ] }, { "sha256": [ "0d4754c43a899fb1e8389d36e95c87b1ed852661fc007041d41b45929a3b34f4" ], "name": [ "endpointelf-v1-blocklist" ] }, { "sha256": [ "eb9689f4e89f0b8b88f6fde235f1d5d9329c3056a21e6f451e36f23604ff8394" ], "name": [ "endpointelf-v1-exceptionlist" ] }, { "sha256": [ "ae9943982909af94f2bef6f2418b103935ac731db362dd74de9bfe4b490c61cf" ], "name": [ "endpointelf-v1-model" ] }, { "sha256": [ "dce6405f0bec1628f3645cfc04b648490ebfec01dcd89af3d68ea243c8a25349" ], "name": [ "global-configuration-v1" ] }, { "sha256": [ "d309bfb8fb555c9d3fba65ce7db66f46a0a14021db0cdc8c015eaf35c011e2dc" ], "name": [ "global-eventfilterlist-linux-v1" ] }, { "sha256": [ "f7b656e62d927b5adad3cb2071adfe7b87f999842a913ff0891c31bf58131732" ], "name": [ "global-exceptionlist-linux" ] }, { "sha256": [ "9365c603590018c969300dfaec7f8758443f03b0e07a29087cfa19dd78298593" ], "name": [ "global-trustlist-linux-v1" ] }, { "sha256": [ "04e65bc253dbeeb9ad8c799616e88b3847c787a258c6c592dea79c111780be46" ], "name": [ "production-malware-signature-v1-linux" ] }, { "sha256": [ "67f1b24dfdd691a204b8343cdf973ac14a81a879f468ec4ffd166bb71cd98e68" ], "name": [ "production-rules-linux-v1" ] } ]
Endpoint.policy.applied.artifacts.global.version
1.0.649
Endpoint.policy.applied.artifacts.user.identifiers
[ { "sha256": [ "d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658" ], "name": [ "endpoint-blocklist-linux-v1" ] }, { "sha256": [ "d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658" ], "name": [ "endpoint-eventfilterlist-linux-v1" ] }, { "sha256": [ "d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658" ], "name": [ "endpoint-exceptionlist-linux-v1" ] }, { "sha256": [ "d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658" ], "name": [ "endpoint-hostisolationexceptionlist-linux-v1" ] }, { "sha256": [ "d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658" ], "name": [ "endpoint-trustlist-linux-v1" ] } ]
Endpoint.policy.applied.artifacts.user.version
1.0.3
Endpoint.policy.applied.endpoint_policy_version
1
Endpoint.policy.applied.id
0b5cff70-16e5-11ee-a56b-b99facd25136
Endpoint.policy.applied.name
Elastic Defend1
Endpoint.policy.applied.response.configurations.behavior_protection.concerned_actions
[agent_connectivity, load_config, workflow, download_global_artifacts, download_user_artifacts, configure_file_events, configure_network_events, configure_process_events, configure_malicious_behavior]
Endpoint.policy.applied.response.configurations.behavior_protection.status
success
Endpoint.policy.applied.response.configurations.events.concerned_actions
[agent_connectivity, load_config, workflow, download_global_artifacts, download_user_artifacts, detect_process_events, detect_file_write_events, detect_network_events, configure_file_events, configure_network_events, configure_process_events]
Endpoint.policy.applied.response.configurations.events.status
success
Endpoint.policy.applied.response.configurations.host_isolation.concerned_actions
[agent_connectivity, configure_host_isolation, load_config, workflow]
Endpoint.policy.applied.response.configurations.host_isolation.status
unsupported
Endpoint.policy.applied.response.configurations.logging.concerned_actions
[agent_connectivity, load_config, configure_logging, workflow]
Endpoint.policy.applied.response.configurations.logging.status
success
Endpoint.policy.applied.response.configurations.malware.concerned_actions
[agent_connectivity, load_config, workflow, download_global_artifacts, download_user_artifacts, configure_malware, detect_process_events, detect_file_write_events, configure_user_notification]
Endpoint.policy.applied.response.configurations.malware.status
success
Endpoint.policy.applied.response.configurations.memory_protection.concerned_actions
[agent_connectivity, configure_memory_threat, configure_process_events, download_global_artifacts, download_user_artifacts, workflow, load_config, detect_process_events]
Endpoint.policy.applied.response.configurations.memory_protection.status
success
Endpoint.policy.applied.response.configurations.response_actions.concerned_actions
configure_response_actions
Endpoint.policy.applied.response.configurations.response_actions.status
success
Endpoint.policy.applied.response.configurations.streaming.concerned_actions
[agent_connectivity, load_config, configure_output, workflow]
Endpoint.policy.applied.response.configurations.streaming.status
success
Endpoint.policy.applied.response.diagnostic.behavior_protection.concerned_actions
[load_config, workflow, download_global_artifacts, download_user_artifacts, configure_file_events, configure_network_events, configure_process_events, configure_diagnostic_malicious_behavior]
Endpoint.policy.applied.response.diagnostic.behavior_protection.status
success
Endpoint.policy.applied.response.diagnostic.malware.concerned_actions
[load_config, workflow, download_global_artifacts, download_user_artifacts, configure_diagnostic_malware, detect_process_events, detect_file_write_events]
Endpoint.policy.applied.response.diagnostic.malware.status
success
Endpoint.policy.applied.response.diagnostic.memory_protection.concerned_actions
[load_config, workflow, download_global_artifacts, download_user_artifacts, detect_process_events, configure_process_events, configure_diagnostic_memory_threat]
Endpoint.policy.applied.response.diagnostic.memory_protection.status
success
Endpoint.policy.applied.status
success
Endpoint.policy.applied.version
2
Endpoint.state.isolation
false
event.action
endpoint_policy_response
event.agent_id_status
verified
event.category
host
event.created
Jun 30, 2023 @ 09:31:04.096
event.id
N7sjEsU104E/Ji49+++++Ysp
event.ingested
Jun 30, 2023 @ 09:31:05.000
event.kind
state
event.module
endpoint
event.sequence
30,869
event.type
change
host.architecture
x86_64
host.hostname
vm-chb1kqq8j5gjs0ttb9j0
host.id
9e065f0961d84ecf8bec2457d927e012
host.ip
[127.0.0.1, ::1, 10.122.131.98, fe80::200:ff:fe4a:3a9a]
host.mac
00:00:00:4a:3a:9a
host.name
vm-chb1kqq8j5gjs0ttb9j0
host.os.Ext.variant
CentOS
host.os.family
centos
host.os.full
CentOS 7.9.2009
host.os.kernel
3.10.0-1160.80.1.el7.x86_64 #1 SMP Tue Nov 8 15:48:59 UTC 2022
host.os.name
Linux
host.os.platform
centos
host.os.type
linux
host.os.version
7.9.2009
message
Endpoint policy change
_id
l3bsCYkBbsq_WlIN3K-q
_index
.ds-metrics-endpoint.policy-default-2023.06.30-000001
_score

The integration mode I am using is Cloud Workloads

I suspect that using pod to install elastic agent is unable to collect host log information？

system · June 30, 2023, 2:06am

7.9.2009 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

ferullo · June 30, 2023, 9:10pm

Do you still see the error? If so can you provide a rough english translation please. The document you shared shows that everything is working.

L1NG · July 3, 2023, 1:07am

Hello, I switched the Kibana panel to English and the error message is as follows

L1NG · July 3, 2023, 1:22am

I doubt if the integration between KSPM and Elastic Defend is incompatible, or if my system version is too low and requires a higher version? I tried to restart the rule but didn't see any corresponding log information

ferullo · July 6, 2023, 1:33pm

Hmm.. The screenshot includes an error that you'd see if Elastic Agent reports that Elastic Endpoint is not working correctly. But the policy document you shared from the Endpoint shows the Endpoint is in a successful state.

Only the Elastic Defend integration should affect the status message you see in Kibana. Whether other integrations are added or are working shouldn't affect Elastic Defend integration status. If it's not a problem, perhaps you could try removing the other integrations from the affected Agent to make sure that's the case?

This thread's title mentions k8s. Can you describe how you've deployed the failed Agent and Endpoint? Maybe they need to be installed differently.

Also, can you run sudo /opt/Elastic/Agent/elastic-agent status and see what that shows for Endpoint's status?

L1NG · July 7, 2023, 6:05am

I use the elastic agent deployed by the K8S daemonset, and checked the status of the elastic agent as you said. The error log is like this. How to solve it?

13:54:04.336
elastic_agent
[elastic_agent][error] Component state changed endpoint-default (DEGRADED->FAILED): Failed: endpoint service missed 3 check-ins
13:54:04.337
elastic_agent
[elastic_agent][error] Unit state changed endpoint-default (STARTING->FAILED): Failed: endpoint service missed 3 check-ins
13:54:04.337
elastic_agent
[elastic_agent][error] Unit state changed endpoint-default-70a88160-1c8a-11ee-980f-f75ba97fe66d (STARTING->FAILED): Failed: endpoint service missed 3 check-ins
13:54:48.493
elastic_agent
[elastic_agent][info] Updating running component model

The following is what I use K8S daemonset

[root@k8s k8s]# cat elastic-agent-managed-kubernetes.yml
---
# For more information https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-managed-by-fleet.html
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: elastic-agent
  namespace: kube-system
  labels:
    app: elastic-agent
spec:
  selector:
    matchLabels:
      app: elastic-agent
  template:
    metadata:
      labels:
        app: elastic-agent
    spec:
      # Tolerations are needed to run Elastic Agent on Kubernetes control-plane nodes.
      # Agents running on control-plane nodes collect metrics from the control plane components (scheduler, controller manager) of Kubernetes
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      serviceAccountName: elastic-agent
      hostNetwork: true
      # 'hostPID: true' enables the Elastic Security integration to observe all process exec events on the host.
      # Sharing the host process ID namespace gives visibility of all processes running on the same host.
      hostPID: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: elastic-agent
          image: docker.elastic.co/beats/elastic-agent:8.8.1
          env:
            # Set to 1 for enrollment into Fleet server. If not set, Elastic Agent is run in standalone mode
            - name: FLEET_ENROLL
              value: "1"
            # Set to true to communicate with Fleet with either insecure HTTP or unverified HTTPS
            - name: FLEET_INSECURE
              value: "true"
            # Fleet Server URL to enroll the Elastic Agent into
            # FLEET_URL can be found in Kibana, go to Management > Fleet > Settings
            - name: FLEET_URL
              value: "https://10.122.131.98:8220"
            # Elasticsearch API key used to enroll Elastic Agents in Fleet (https://www.elastic.co/guide/en/fleet/current/fleet-enrollment-tokens.html#fleet-enrollment-tokens)
            # If FLEET_ENROLLMENT_TOKEN is empty then KIBANA_HOST, KIBANA_FLEET_USERNAME, KIBANA_FLEET_PASSWORD are needed
            - name: FLEET_ENROLLMENT_TOKEN
              value: "VlIybUhva0I5NUNVeHRjelRzckQ6ZU1sUFlVZWRSYkdmMmNfV095OWNmdw=="
            - name: KIBANA_HOST
              value: "http://10.122.131.98:5601"
            # The basic authentication username used to connect to Kibana and retrieve a service_token to enable Fleet
            - name: KIBANA_FLEET_USERNAME
              value: "elastic"
            # The basic authentication password used to connect to Kibana and retrieve a service_token to enable Fleet
            - name: KIBANA_FLEET_PASSWORD
              value: "efbFUE6G=hr0q4i=5mcW"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          securityContext:
            runAsUser: 0
            # The following capabilities are needed for 'Defend for containers' integration (cloud-defend)
            # If you are using this integration, please uncomment these lines before applying.
            #capabilities:
            #  add:
            #    - BPF # (since Linux 5.8) allows loading of BPF programs, create most map types, load BTF, iterate programs and maps.
            #    - PERFMON # (since Linux 5.8) allows attaching of BPF programs used for performance metrics and observability operations.
            #    - SYS_RESOURCE # Allow use of special resources or raising of resource limits. Used by 'Defend for Containers' to modify 'rlimit_memlock'
          resources:
            limits:
              memory: 700Mi
            requests:
              cpu: 100m
              memory: 400Mi
          volumeMounts:
            - name: proc
              mountPath: /hostfs/proc
              readOnly: true
            - name: cgroup
              mountPath: /hostfs/sys/fs/cgroup
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: etc-full
              mountPath: /hostfs/etc
              readOnly: true
            - name: var-lib
              mountPath: /hostfs/var/lib
              readOnly: true
            - name: etc-mid
              mountPath: /etc/machine-id
              readOnly: true
            - name: sys-kernel-debug
              mountPath: /sys/kernel/debug
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: cgroup
          hostPath:
            path: /sys/fs/cgroup
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: varlog
          hostPath:
            path: /var/log
        # The following volumes are needed for Cloud Security Posture integration (cloudbeat)
        # If you are not using this integration, then these volumes and the corresponding
        # mounts can be removed.
        - name: etc-full
          hostPath:
            path: /etc
        - name: var-lib
          hostPath:
            path: /var/lib
        # Mount /etc/machine-id from the host to determine host ID
        # Needed for Elastic Security integration
        - name: etc-mid
          hostPath:
            path: /etc/machine-id
            type: File
        # Needed for 'Defend for containers' integration (cloud-defend)
        # If you are not using this integration, then these volumes and the corresponding
        # mounts can be removed.
        - name: sys-kernel-debug
          hostPath:
            path: /sys/kernel/debug
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: kube-system
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: kube-system
roleRef:
  kind: Role
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: elastic-agent-kubeadm-config
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: kube-system
roleRef:
  kind: Role
  name: elastic-agent-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: elastic-agent
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - namespaces
      - events
      - pods
      - services
      - configmaps
      # Needed for cloudbeat
      - serviceaccounts
      - persistentvolumes
      - persistentvolumeclaims
    verbs: ["get", "list", "watch"]
  # Enable this rule only if planing to use kubernetes_secrets provider
  #- apiGroups: [""]
  #  resources:
  #  - secrets
  #  verbs: ["get"]
  - apiGroups: ["extensions"]
    resources:
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources:
      - statefulsets
      - deployments
      - replicasets
      - daemonsets
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - ""
    resources:
      - nodes/stats
    verbs:
      - get
  - apiGroups: [ "batch" ]
    resources:
      - jobs
      - cronjobs
    verbs: [ "get", "list", "watch" ]
  # Needed for apiserver
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
  # Needed for cloudbeat
  - apiGroups: ["rbac.authorization.k8s.io"]
    resources:
      - clusterrolebindings
      - clusterroles
      - rolebindings
      - roles
    verbs: ["get", "list", "watch"]
  # Needed for cloudbeat
  - apiGroups: ["policy"]
    resources:
      - podsecuritypolicies
    verbs: ["get", "list", "watch"]
  - apiGroups: [ "storage.k8s.io" ]
    resources:
      - storageclasses
    verbs: [ "get", "list", "watch" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent
  # Should be the namespace where elastic-agent is running
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent-kubeadm-config
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
---

L1NG · July 13, 2023, 2:11am

@ferullo Hello, can you determine what problem is causing this result?

system · August 10, 2023, 2:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.