Hello Community:
Im triying to automate the deployment of elastic-agents using Ansible. I have two roles for this.
Using the first, I enroll the agent using the respective command, all well until this point. The agent report as healthy in Fleet server.
The other role configure the System integration (filebeat/metricbeat) to use the correct Elasticsearch cluster hosts as well as the
correct certificates of these hosts, for this I have a template (filebeat/metricbeat).yml.j2 with the correct values.
For now I only try configure filebeat.
The problem comes when I run the playbook using the second role to configure the agent's integration, I explain:
1- the Ansible output is clean, do not throw any errors.
2- The agent shows as Unhealthy in Fleet
3- in target host this is the output for elastic-agent status:
Status: FAILED
Message: (no message)
Applications:
* metricbeat (CONFIGURING)
Updating configuration
* osquerybeat (CONFIGURING)
Updating configuration
* filebeat (FAILED)
Missed two check-ins
* filebeat_monitoring (FAILED)
Missed two check-ins
* metricbeat_monitoring (CONFIGURING)
Updating configuration
4- Output for elastic-agent diagnostics:
elastic-agent diagnostics
elastic-agent version: 7.17.1
build_commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff build_time: 2022-02-24 09:30:06 +0000 UTC snapshot_build: false
Applications:
* name: metricbeat route_key: default
process: metricbeat id: 16f9b1f5-2202-410e-b2f6-8aa5f2d5ab5f ephemeral_id: 0aeb3974-946a-403b-94f0-e686b0133296 elastic_license: true
version: 7.17.1 commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff build_time: 2022-02-23 23:50:00 +0000 UTC binary_arch: amd64
hostname: srvh0000 username: root user_id: 0 user_gid: 0
* name: osquerybeat route_key: default
process: osquerybeat id: c3df3813-6136-4330-878e-af2d5e8955df ephemeral_id: e18de22a-ea29-44c3-ab97-630bcfd3a503 elastic_license: true
version: 7.17.1 commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff build_time: 2022-02-23 23:27:49 +0000 UTC binary_arch: amd64
hostname: srvh0000 username: root user_id: 0 user_gid: 0
* name: filebeat route_key: default
error: Get "http://unix/": dial unix /var/lib/elastic-agent/data/tmp/default/filebeat/filebeat.sock: connect: no such file or directory
* name: filebeat_monitoring route_key: default
error: Get "http://unix/": dial unix /var/lib/elastic-agent/data/tmp/default/filebeat/filebeat.sock: connect: no such file or directory
* name: metricbeat_monitoring route_key: default
process: metricbeat id: 16f9b1f5-2202-410e-b2f6-8aa5f2d5ab5f ephemeral_id: 0aeb3974-946a-403b-94f0-e686b0133296 elastic_license: true
version: 7.17.1 commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff build_time: 2022-02-23 23:50:00 +0000 UTC binary_arch: amd64
hostname: srvh0000 username: root user_id: 0 user_gid: 0
Note the error throw by filebeat: error: Get "http://unix/": dial unix /var/lib/elastic-agent/data/tmp/default/filebeat/filebeat.sock: connect: no such file or directory
Obviously the file doesn't exists. I noted in my experience (which is NOT vast) that this file it is created when there is a successful connection to Fleet server, I have tried this procedure manually many times and always works fine.
5- Logs (the difference in the hour I guess is because the difference between HD and OS hours, but everything is part of the same track, note the minutes)
---Log of elastic-agent (final lines when error pop-up):
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.102Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 3 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.425Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":62},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.425Z","log.origin":{"file.name":"application/managed_mode.go","file.line":291},"message":"Agent is starting","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.426Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":64},"message":"Metrics endpoint listening on: /var/lib/elastic-agent/data/tmp/elastic-agent.sock (configured: unix:///var/lib/elastic-agent/data/tmp/elastic-agent.sock)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.838Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.838Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.289Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.289Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.648Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-04-08T12:26:34-03:00 - message: Application: metricbeat--7.17.1--36643631373035623733363936343635[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to CONFIG: Updating configuration - type: 'STATE' - sub_type: 'CONFIG'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.698Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.698Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:35.257Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:35.258Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:35.262Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":66},"message":"Updating internal state","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:27:34.482Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-04-08T12:27:34-03:00 - message: Application: filebeat--7.17.1[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2022-04-08T15:27:34.482Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'degraded'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:27:34.482Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-04-08T12:27:34-03:00 - message: Application: filebeat--7.17.1--36643631373035623733363936343635[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T15:28:34.493Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'error'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T15:28:34.493Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-04-08T12:28:34-03:00 - message: Application: filebeat--7.17.1[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T15:28:34.496Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-04-08T12:28:34-03:00 - message: Application: filebeat--7.17.1--36643631373035623733363936343635[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
---Log of filebeat:
{"log.level":"info","@timestamp":"2022-04-08T12:23:32.642-0300","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 57 reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:23:32.642-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:23:32.642-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":" done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T12:23:32.649-0300","log.logger":"esclientleg","log.origin":{"file.name":"transport/logging.go","file.line":37},"message":"Error dialing dial tcp 127.0.0.1:9200: connect: connection refused","service.name":"filebeat","network":"tcp","address":"localhost:9200","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:23:38.353-0300","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":184},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cpu":{"system":{"ticks":11380,"time":{"ms":24}},"total":{"ticks":25150,"time":{"ms":41},"value":25150},"user":{"ticks":13770,"time":{"ms":17}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"6c6b4767-2978-4a68-be18-908da9c77d70","uptime":{"ms":2370365},"version":"7.17.1"},"memstats":{"gc_next":60359568,"memory_alloc":37558272,"memory_total":472161176,"rss":153415680},"runtime":{"goroutines":27}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"active":0}},"pipeline":{"clients":0,"events":{"active":4116,"retry":50}}},"registrar":{"states":{"current":10}},"system":{"load":{"1":0.08,"15":0.07,"5":0.06,"norm":{"1":0.04,"15":0.035,"5":0.03}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2022-04-08T12:24:08.357-0300","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":184},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cpu":{"system":{"ticks":11410,"time":{"ms":26}},"total":{"ticks":25200,"time":{"ms":50},"value":25200},"user":{"ticks":13790,"time":{"ms":24}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"6c6b4767-2978-4a68-be18-908da9c77d70","uptime":{"ms":2400347},"version":"7.17.1"},"memstats":{"gc_next":60359568,"memory_alloc":37770928,"memory_total":472373832,"rss":153415680},"runtime":{"goroutines":27}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"active":0}},"pipeline":{"clients":0,"events":{"active":4116}}},"registrar":{"states":{"current":10}},"system":{"load":{"1":0.13,"15":0.08,"5":0.07,"norm":{"1":0.065,"15":0.04,"5":0.035}}}},"ecs.version":"1.6.0"}}
{"log.level":"error","@timestamp":"2022-04-08T12:24:32.266-0300","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": dial tcp 127.0.0.1:9200: connect: connection refused","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:24:32.266-0300","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 58 reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:24:32.267-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:24:32.267-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":" done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T12:24:32.269-0300","log.logger":"esclientleg","log.origin":{"file.name":"transport/logging.go","file.line":37},"message":"Error dialing dial tcp 127.0.0.1:9200: connect: connection refused","service.name":"filebeat","network":"tcp","address":"localhost:9200","ecs.version":"1.6.0"}
6- Configurations:
filebeat.yml. Located under /var/lib/elastic-agent/data/elastic-agent-1d05ba/install/filebeat-7.17.1-linux-x86_64/
This file is correctly substituted by Ansible using my template file.
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["server1:9200", "server2:9200", "server3:9200"]
# Protocol - either `http` (default) or `https`.
protocol: "https"
ssl.certificate_authorities: ["/etc/elastic-agent/server1.crt", "/etc/elastic-agent/server2.crt", "/etc/elastic-agent/server3.crt"]
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "elastic"
password: "this is the password"
7- Ansible Role for configure filebeat (The second I mention at the beginning of the post):
---
- name: Stop elastic-agent service
service:
name: elastic-agent
state: stopped
- name: Get current Elastic Agent data folder
find:
paths: /var/lib/elastic-agent/data
file_type: directory
patterns: 'elastic-agent-*'
register: _result
#- name: Print return information from the previous task
# ansible.builtin.debug:
# msg: "{{ item.path }}"
# with_items: "{{ _result.files }}"
- name: Configure Elastic Agent - Filebeat from template
ansible.builtin.template:
src: filebeat.yml.j2
dest: "{{ item.path }}/install/filebeat-7.17.1-linux-x86_64/filebeat.yml"
owner: root
group: root
mode: 0660
with_items: "{{ _result.files }}"
#notify: restart elastic-agent
- name: Configure Elastic Agent - Filebeat from files
ansible.builtin.copy:
src: "{{ item | basename }}"
dest: "{{ item }}"
owner: root
group: root
mode: 0660
with_items:
- "/etc/elastic-agent/server1.crt"
- "/etc/elastic-agent/server2.crt"
- "/etc/elastic-agent/server3.crt"
#notify: restart elastic-agent
- name: Start Elastic Agent
service:
name: elastic-agent
state: started
As I mention before I have did this procedure several times manually (enroll and configure the agent) and always works fine. Can anyone help me on this? Can I provide something else to help troubleshooting?
Sorry the long post, best regards to all
Thanks in advance