Hello
I have been having issues with Elastic Agents on my Proxmox hosts (Debian12) and in particular the Endpoint integration that fails to become Healthy. All other integrations, like packet capture and system logs are OK
After few seconds after restarting the elastic agent the endpoint is unable to register with the agent
# elastic-agent status
┌─ fleet
│ └─ status: (HEALTHY) Connected
└─ elastic-agent
├─ status: (DEGRADED) 1 or more components/units in a failed state
└─ endpoint-default
├─ status: (FAILED) Failed: endpoint service missed 3 check-ins
├─ endpoint-default
│ └─ status: (FAILED) Failed: endpoint service missed 3 check-ins
└─ endpoint-default-92fa049c-8082-482f-9328-aa425f583e8e
└─ status: (FAILED) Failed:
Here is full set of logs from elastic
|Mar 11, 2025 @ 18:17:12.251|Component state changed endpoint-default (DEGRADED->FAILED): Failed: endpoint service missed 3 check-ins|
|---|---|
|Mar 11, 2025 @ 18:17:12.251|Unit state changed endpoint-default-92fa049c-8082-482f-9328-aa425f583e8e (STARTING->FAILED): Failed: endpoint service missed 3 check-ins|
|Mar 11, 2025 @ 18:17:12.251|Unit state changed endpoint-default (STARTING->FAILED): Failed: endpoint service missed 3 check-ins|
|Mar 11, 2025 @ 18:16:12.250|Component state changed endpoint-default (STARTING->DEGRADED): Degraded: endpoint service missed 1 check-in|
|Mar 11, 2025 @ 18:15:43.146|Unit state changed log-default-logfile-system-e7140f01-940d-4ba2-9bcc-e9f20352aaf1 (STARTING->HEALTHY): Healthy|
|Mar 11, 2025 @ 18:15:43.145|Unit state changed log-default (STARTING->HEALTHY): Healthy|
|Mar 11, 2025 @ 18:15:42.984|Unit state changed osquery-default (STARTING->HEALTHY): Healthy|
|Mar 11, 2025 @ 18:15:42.984|Unit state changed osquery-default-51e20a89-436f-4787-93b9-cdd91a4699b1 (STARTING->HEALTHY): Healthy|
|Mar 11, 2025 @ 18:15:42.749|Unit state changed packet-default-packet-network-c10e94db-070f-41ca-8834-b1531c55ec52 (STARTING->HEALTHY): Healthy|
|Mar 11, 2025 @ 18:15:42.748|Unit state changed packet-default (STARTING->HEALTHY): Healthy|
|Mar 11, 2025 @ 18:15:42.564|control checkin v2 protocol has chunking enabled|
|Mar 11, 2025 @ 18:15:42.449|control checkin v2 protocol has chunking enabled|
|Mar 11, 2025 @ 18:15:42.326|control checkin v2 protocol has chunking enabled|
|Mar 11, 2025 @ 18:15:42.249|2025-03-11 16:15:42: info: InstallLib.cpp:650 Installed endpoint is expected version (version: 8.17.3, compiled: Wed Feb 26 21:00:00 2025, branch: HEAD, commit: e54b5de09796d1b3601f7d5472359c11fafafc67)|
|Mar 11, 2025 @ 18:15:42.249|after check if endpoint service is installed, err: <nil>|
|Mar 11, 2025 @ 18:15:42.148|2025-03-11 16:15:42: debug: ProcFile.cpp:855 Found 1 cgroups for pid(3355999)|
|Mar 11, 2025 @ 18:15:42.148|2025-03-11 16:15:42: debug: ProcFile.cpp:861 cgroup: id=0 type= path=/system.slice/elastic-agent.service|
|Mar 11, 2025 @ 18:15:42.148|2025-03-11 16:15:42: info: MainPosix.cpp:389 Verifying existing installation|
|Mar 11, 2025 @ 18:15:42.148|2025-03-11 16:15:42: info: InstallLib.cpp:610 Running [/opt/Elastic/Endpoint/elastic-endpoint] [version --log stdout]|
|Mar 11, 2025 @ 18:15:42.148|2025-03-11 16:15:42: debug: Exec.cpp:189 ChildMonitor is pid 3356002 and monitoring pids 3355999 and 3356000|
|Mar 11, 2025 @ 18:15:42.144|Creating connection info server for endpoint service, address: unix:///var/lib/elastic-agent/.eaci.sock|
|Mar 11, 2025 @ 18:15:42.144|check if endpoint service is installed|
|Mar 11, 2025 @ 18:15:42.144|Spawned new component endpoint-default: Starting: endpoint service runtime|
|Mar 11, 2025 @ 18:15:42.144|Spawned new unit endpoint-default-92fa049c-8082-482f-9328-aa425f583e8e: Starting: endpoint service runtime|
|Mar 11, 2025 @ 18:15:42.144|Spawned new unit endpoint-default: Starting: endpoint service runtime|
|Mar 11, 2025 @ 18:15:42.143|control checkin v2 protocol has chunking enabled|
|Mar 11, 2025 @ 18:15:42.143|Component state changed log-default (STARTING->HEALTHY): Healthy: communicating with pid '3355981'|
|Mar 11, 2025 @ 18:15:42.042|Spawned new component log-default: Starting: spawned pid '3355981'|
|Mar 11, 2025 @ 18:15:42.042|Spawned new unit log-default-logfile-system-e7140f01-940d-4ba2-9bcc-e9f20352aaf1: Starting: spawned pid '3355981'|
|Mar 11, 2025 @ 18:15:42.042|Spawned new unit log-default: Starting: spawned pid '3355981'|
|Mar 11, 2025 @ 18:15:41.982|control checkin v2 protocol has chunking enabled|
|Mar 11, 2025 @ 18:15:41.982|Component state changed osquery-default (STARTING->HEALTHY): Healthy: communicating with pid '3355964'|
|Mar 11, 2025 @ 18:15:41.913|Spawned new component osquery-default: Starting: spawned pid '3355964'|
|Mar 11, 2025 @ 18:15:41.913|Spawned new unit osquery-default-51e20a89-436f-4787-93b9-cdd91a4699b1: Starting: spawned pid '3355964'|
|Mar 11, 2025 @ 18:15:41.913|Spawned new unit osquery-default: Starting: spawned pid '3355964'|
|Mar 11, 2025 @ 18:15:41.796|Component state changed packet-default (STARTING->HEALTHY): Healthy: communicating with pid '3355947'|
|Mar 11, 2025 @ 18:15:41.746|control checkin v2 protocol has chunking enabled|
|Mar 11, 2025 @ 18:15:41.681|Spawned new component packet-default: Starting: spawned pid '3355947'|
|Mar 11, 2025 @ 18:15:41.681|Spawned new unit packet-default-packet-network-c10e94db-070f-41ca-8834-b1531c55ec52: Starting: spawned pid '3355947'|
|Mar 11, 2025 @ 18:15:41.681|Spawned new unit packet-default: Starting: spawned pid '3355947'|
|Mar 11, 2025 @ 18:15:41.503|Updating running component model|
|Mar 11, 2025 @ 18:15:41.503|SSL/TLS verifications disabled.|
|Mar 11, 2025 @ 18:15:41.493|Starting stats endpoint|
|Mar 11, 2025 @ 18:15:41.493|Metrics endpoint listening on: 127.0.0.1:6791 (configured: http://localhost:6791)|
|Mar 11, 2025 @ 18:15:41.492|Starting monitoring server with cfg &config.MonitoringConfig{Enabled:true, MonitorLogs:true, MonitorMetrics:true, MetricsPeriod:, FailureThreshold:(*uint)(nil), LogMetrics:true, HTTP:(*config.MonitoringHTTPConfig)(0xc001b828a0), Namespace:default, Pprof:(*config.PprofConfig)(nil), MonitorTraces:false, APM:config.APMConfig{Environment:, APIKey:, SecretToken:, Hosts:[]string(nil), GlobalLabels:map[string]string(nil), TLS:config.APMTLS{SkipVerify:false, ServerCertificate:, ServerCA:}, SamplingRate:(*float32)(nil)}, Diagnostics:config.Diagnostics{Uploader:config.Uploader{MaxRetries:10, InitDur:1000000000, MaxDur:600000000000}, Limit:config.Limit{Interval:60000000000, Burst:1}}}|
|Mar 11, 2025 @ 18:15:41.492|creating monitoring API with cfg api.Config{Enabled:true, Host:http://localhost:6791, Port:6791, User:, SecurityDescriptor:, Timeout:5000000000}|
|Mar 11, 2025 @ 18:15:41.491|Source URI changed from https://artifacts.elastic.co/downloads/ to https://artifacts.elastic.co/downloads/|
|Mar 11, 2025 @ 18:15:41.489|Fleet gateway started|
|Mar 11, 2025 @ 18:15:41.484|Setting fallback log level <nil> from policy|
|Mar 11, 2025 @ 18:15:41.477|restoring current policy from disk|
|Mar 11, 2025 @ 18:15:40.867|Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?|
|Mar 11, 2025 @ 18:15:40.866|GRPC control socket listening at unix:///var/lib/elastic-agent/elastic-agent.sock|
|Mar 11, 2025 @ 18:15:40.866|Starting grpc control protocol listener on port 6789 with max_message_size 104857600|
|Mar 11, 2025 @ 18:15:40.851|Parsed configuration and determined agent is managed by Fleet|
|Mar 11, 2025 @ 18:15:40.851|SSL/TLS verifications disabled.|
|Mar 11, 2025 @ 18:15:40.842|GRPC comms socket listening at localhost:6789|
|Mar 11, 2025 @ 18:15:40.607|Capabilities file not found in /etc/elastic-agent/capabilities.yml|
|Mar 11, 2025 @ 18:15:40.607|Determined allowed capabilities|
|Mar 11, 2025 @ 18:15:40.607|Loading baseline config from /etc/elastic-agent/elastic-agent.yml|
|Mar 11, 2025 @ 18:15:40.606|Detected available inputs and outputs|
|Mar 11, 2025 @ 18:15:40.595|Gathered system information|
|Mar 11, 2025 @ 18:15:40.589|APM instrumentation disabled|
|Mar 11, 2025 @ 18:15:40.587|agent is not upgradable, not starting watcher|
|Mar 11, 2025 @ 18:15:40.378|Elastic Agent started|
|Mar 11, 2025 @ 18:15:39.931|Fleet gateway stopped|
|Mar 11, 2025 @ 18:15:38.915|reexec shutdown channel triggered|
|Mar 11, 2025 @ 18:15:38.915|failed accept conn info connection: accept unix /var/lib/elastic-agent/.eaci.sock: use of closed network connection|
|Mar 11, 2025 @ 18:15:38.915|Possible transient error during checkin with fleet-server, retrying|
|Mar 11, 2025 @ 18:15:38.915|stopping endpoint service runtime|
Log output from the endpoint
cat /opt/Elastic/Endpoint/state/log/endpoint-000000.log
{"@timestamp":"2025-03-11T15:16:43.632877363Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":228,"name":"Logging.cpp"}}},"message":"Logging.cpp:228 Endpoint info: version: 8.17.0, compiled: Wed Dec 4 19:00:00 2024, branch: HEAD, commit: eea523e3a3b39f3a258c17d05b983a723bd86682","process":{"pid":3329644,"thread":{"id":3329644}}}
{"@timestamp":"2025-03-11T15:16:43.632902266Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":134,"name":"PolicyConfig.cpp"}}},"message":"PolicyConfig.cpp:134 Registered configuration callback for logging","process":{"pid":3329644,"thread":{"id":3329644}}}
{"@timestamp":"2025-03-11T15:16:43.632911736Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":176,"name":"Entry.cpp"}}},"message":"Entry.cpp:176 Loading plugin: documentLogging","process":{"pid":3329644,"thread":{"id":3329644}}}
{"@timestamp":"2025-03-11T15:52:19.154942089Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":228,"name":"Logging.cpp"}}},"message":"Logging.cpp:228 Endpoint info: version: 8.17.3, compiled: Wed Feb 26 21:00:00 2025, branch: HEAD, commit: e54b5de09796d1b3601f7d5472359c11fafafc67","process":{"pid":3345597,"thread":{"id":3345597}}}
{"@timestamp":"2025-03-11T15:52:19.154982504Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":134,"name":"PolicyConfig.cpp"}}},"message":"PolicyConfig.cpp:134 Registered configuration callback for logging","process":{"pid":3345597,"thread":{"id":3345597}}}
{"@timestamp":"2025-03-11T15:52:19.154990871Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":176,"name":"Entry.cpp"}}},"message":"Entry.cpp:176 Loading plugin: documentLogging","process":{"pid":3345597,"thread":{"id":3345597}}}
{"@timestamp":"2025-03-11T16:02:20.231786761Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":89,"name":"MainPosix.cpp"}}},"message":"MainPosix.cpp:89 Aborting due to signal","process":{"pid":3345597,"thread":{"id":3345600}}}
i checked networking as i thought that network was blocking the connection between agent and endpoint services but i could see the raw packets flowing normally
# ss -ntulp | grep elastic
tcp LISTEN 0 4096 127.0.0.1:6789 0.0.0.0:* users:(("elastic-agent",pid=3344921,fd=12))
tcp LISTEN 0 4096 127.0.0.1:6791 0.0.0.0:* users:(("elastic-agent",pid=3344921,fd=13))
# tcpdump -i any -nn port 6789 or port 6791
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
17:21:29.588552 lo In IP 127.0.0.1.51694 > 127.0.0.1.6791: Flags [.], ack 4192996267, win 512, options [nop,nop,TS val 3041978560 ecr 3041963200], length 0
17:21:29.588556 lo In IP 127.0.0.1.6791 > 127.0.0.1.51694: Flags [.], ack 1, win 512, options [nop,nop,TS val 3041978560 ecr 3041963200], length 0
17:21:29.588595 lo In IP 127.0.0.1.6791 > 127.0.0.1.51694: Flags [.], ack 1, win 512, options [nop,nop,TS val 3041978560 ecr 3041963200], length 0
17:21:29.588598 lo In IP 127.0.0.1.51694 > 127.0.0.1.6791: Flags [.], ack 1, win 512, options [nop,nop,TS val 3041978560 ecr 3041978560], length 0
17:21:32.749659 lo In IP 127.0.0.1.35932 > 127.0.0.1.6789: Flags [P.], seq 3700583884:3700584202, ack 2357837195, win 512, options [nop,nop,TS val 3041981721 ecr 3041956762], length 318
17:21:32.749664 lo In IP 127.0.0.1.6789 > 127.0.0.1.35932: Flags [.], ack 318, win 512, options [nop,nop,TS val 3041981721 ecr 3041981721], length 0
17:21:32.749794 lo In IP 127.0.0.1.6789 > 127.0.0.1.35932: Flags [P.], seq 1:53, ack 318, win 512, options [nop,nop,TS val 3041981721 ecr 3041981721], length 52
17:21:32.749879 lo In IP 127.0.0.1.35932 > 127.0.0.1.6789: Flags [P.], seq 318:357, ack 53, win 512, options [nop,nop,TS val 3041981721 ecr 3041981721], length 39
17:21:32.790717 lo In IP 127.0.0.1.6789 > 127.0.0.1.35932: Flags [.], ack 357, win 512, options [nop,nop,TS val 3041981762 ecr 3041981721], length 0
17:21:32.984679 lo In IP 127.0.0.1.35936 > 127.0.0.1.6789: Flags [P.], seq 138184057:138184490, ack 275999808, win 512, options [nop,nop,TS val 3041981956 ecr 3041956998], length 433