Filebeat.sock file do not exists anymore when deploy and configure elastic-agent using Ansible

Hello Community:

Im triying to automate the deployment of elastic-agents using Ansible. I have two roles for this.
Using the first, I enroll the agent using the respective command, all well until this point. The agent report as healthy in Fleet server.
The other role configure the System integration (filebeat/metricbeat) to use the correct Elasticsearch cluster hosts as well as the
correct certificates of these hosts, for this I have a template (filebeat/metricbeat).yml.j2 with the correct values.
For now I only try configure filebeat.
The problem comes when I run the playbook using the second role to configure the agent's integration, I explain:
1- the Ansible output is clean, do not throw any errors.
2- The agent shows as Unhealthy in Fleet
3- in target host this is the output for elastic-agent status:

Status: FAILED
Message: (no message)
Applications:
  * metricbeat             (CONFIGURING)
                           Updating configuration
  * osquerybeat            (CONFIGURING)
                           Updating configuration
  * filebeat               (FAILED)
                           Missed two check-ins
  * filebeat_monitoring    (FAILED)
                           Missed two check-ins
  * metricbeat_monitoring  (CONFIGURING)
                           Updating configuration

4- Output for elastic-agent diagnostics:
elastic-agent diagnostics

elastic-agent  version: 7.17.1
               build_commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff  build_time: 2022-02-24 09:30:06 +0000 UTC  snapshot_build: false
Applications:
  *  name: metricbeat      route_key: default
     process: metricbeat   id: 16f9b1f5-2202-410e-b2f6-8aa5f2d5ab5f          ephemeral_id: 0aeb3974-946a-403b-94f0-e686b0133296  elastic_license: true
     version: 7.17.1       commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff  build_time: 2022-02-23 23:50:00 +0000 UTC           binary_arch: amd64
     hostname: srvh0000    username: root                                    user_id: 0                                          user_gid: 0
  *  name: osquerybeat     route_key: default
     process: osquerybeat  id: c3df3813-6136-4330-878e-af2d5e8955df          ephemeral_id: e18de22a-ea29-44c3-ab97-630bcfd3a503  elastic_license: true
     version: 7.17.1       commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff  build_time: 2022-02-23 23:27:49 +0000 UTC           binary_arch: amd64
     hostname: srvh0000    username: root                                    user_id: 0                                          user_gid: 0
  *  name: filebeat        route_key: default
     error: Get "http://unix/": dial unix /var/lib/elastic-agent/data/tmp/default/filebeat/filebeat.sock: connect: no such file or directory
  *  name: filebeat_monitoring  route_key: default
     error: Get "http://unix/": dial unix /var/lib/elastic-agent/data/tmp/default/filebeat/filebeat.sock: connect: no such file or directory
  *  name: metricbeat_monitoring  route_key: default
     process: metricbeat          id: 16f9b1f5-2202-410e-b2f6-8aa5f2d5ab5f          ephemeral_id: 0aeb3974-946a-403b-94f0-e686b0133296  elastic_license: true
     version: 7.17.1              commit: 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff  build_time: 2022-02-23 23:50:00 +0000 UTC           binary_arch: amd64
     hostname: srvh0000           username: root                                    user_id: 0                                          user_gid: 0

Note the error throw by filebeat: error: Get "http://unix/": dial unix /var/lib/elastic-agent/data/tmp/default/filebeat/filebeat.sock: connect: no such file or directory
Obviously the file doesn't exists. I noted in my experience (which is NOT vast) that this file it is created when there is a successful connection to Fleet server, I have tried this procedure manually many times and always works fine.

5- Logs (the difference in the hour I guess is because the difference between HD and OS hours, but everything is part of the same track, note the minutes)

---Log of elastic-agent (final lines when error pop-up):


{"log.level":"info","@timestamp":"2022-04-08T15:26:33.102Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 3 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.425Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":62},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.425Z","log.origin":{"file.name":"application/managed_mode.go","file.line":291},"message":"Agent is starting","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.426Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":64},"message":"Metrics endpoint listening on: /var/lib/elastic-agent/data/tmp/elastic-agent.sock (configured: unix:///var/lib/elastic-agent/data/tmp/elastic-agent.sock)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.838Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:33.838Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.289Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.289Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.648Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-04-08T12:26:34-03:00 - message: Application: metricbeat--7.17.1--36643631373035623733363936343635[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to CONFIG: Updating configuration - type: 'STATE' - sub_type: 'CONFIG'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.698Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:34.698Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for filebeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:35.257Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-install' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:35.258Z","log.origin":{"file.name":"operation/operator.go","file.line":284},"message":"operation 'operation-start' skipped for metricbeat.7.17.1","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:26:35.262Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":66},"message":"Updating internal state","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:27:34.482Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-04-08T12:27:34-03:00 - message: Application: filebeat--7.17.1[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2022-04-08T15:27:34.482Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'degraded'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T15:27:34.482Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-04-08T12:27:34-03:00 - message: Application: filebeat--7.17.1--36643631373035623733363936343635[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T15:28:34.493Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'error'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T15:28:34.493Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-04-08T12:28:34-03:00 - message: Application: filebeat--7.17.1[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T15:28:34.496Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-04-08T12:28:34-03:00 - message: Application: filebeat--7.17.1--36643631373035623733363936343635[47e74575-ce39-4910-9471-3fd0a494aecc]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

---Log of filebeat:

{"log.level":"info","@timestamp":"2022-04-08T12:23:32.642-0300","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 57 reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:23:32.642-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:23:32.642-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T12:23:32.649-0300","log.logger":"esclientleg","log.origin":{"file.name":"transport/logging.go","file.line":37},"message":"Error dialing dial tcp 127.0.0.1:9200: connect: connection refused","service.name":"filebeat","network":"tcp","address":"localhost:9200","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:23:38.353-0300","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":184},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cpu":{"system":{"ticks":11380,"time":{"ms":24}},"total":{"ticks":25150,"time":{"ms":41},"value":25150},"user":{"ticks":13770,"time":{"ms":17}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"6c6b4767-2978-4a68-be18-908da9c77d70","uptime":{"ms":2370365},"version":"7.17.1"},"memstats":{"gc_next":60359568,"memory_alloc":37558272,"memory_total":472161176,"rss":153415680},"runtime":{"goroutines":27}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"active":0}},"pipeline":{"clients":0,"events":{"active":4116,"retry":50}}},"registrar":{"states":{"current":10}},"system":{"load":{"1":0.08,"15":0.07,"5":0.06,"norm":{"1":0.04,"15":0.035,"5":0.03}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2022-04-08T12:24:08.357-0300","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":184},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cpu":{"system":{"ticks":11410,"time":{"ms":26}},"total":{"ticks":25200,"time":{"ms":50},"value":25200},"user":{"ticks":13790,"time":{"ms":24}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"6c6b4767-2978-4a68-be18-908da9c77d70","uptime":{"ms":2400347},"version":"7.17.1"},"memstats":{"gc_next":60359568,"memory_alloc":37770928,"memory_total":472373832,"rss":153415680},"runtime":{"goroutines":27}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"active":0}},"pipeline":{"clients":0,"events":{"active":4116}}},"registrar":{"states":{"current":10}},"system":{"load":{"1":0.13,"15":0.08,"5":0.07,"norm":{"1":0.065,"15":0.04,"5":0.035}}}},"ecs.version":"1.6.0"}}
{"log.level":"error","@timestamp":"2022-04-08T12:24:32.266-0300","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": dial tcp 127.0.0.1:9200: connect: connection refused","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:24:32.266-0300","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 58 reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:24:32.267-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-04-08T12:24:32.267-0300","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-04-08T12:24:32.269-0300","log.logger":"esclientleg","log.origin":{"file.name":"transport/logging.go","file.line":37},"message":"Error dialing dial tcp 127.0.0.1:9200: connect: connection refused","service.name":"filebeat","network":"tcp","address":"localhost:9200","ecs.version":"1.6.0"}

6- Configurations:

filebeat.yml. Located under /var/lib/elastic-agent/data/elastic-agent-1d05ba/install/filebeat-7.17.1-linux-x86_64/
This file is correctly substituted by Ansible using my template file.


# ---------------------------- Elasticsearch Output ----------------------------

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["server1:9200", "server2:9200", "server3:9200"]

  # Protocol - either `http` (default) or `https`.
  protocol: "https"
  ssl.certificate_authorities: ["/etc/elastic-agent/server1.crt", "/etc/elastic-agent/server2.crt", "/etc/elastic-agent/server3.crt"]

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  password: "this is the password"

7- Ansible Role for configure filebeat (The second I mention at the beginning of the post):

---
- name: Stop elastic-agent service
  service:
    name: elastic-agent
    state: stopped

- name: Get current Elastic Agent data folder
  find:
    paths: /var/lib/elastic-agent/data
    file_type: directory
    patterns: 'elastic-agent-*'
  register: _result

#- name: Print return information from the previous task
#  ansible.builtin.debug:
#    msg: "{{ item.path }}"
#  with_items: "{{ _result.files }}"

- name: Configure Elastic Agent - Filebeat from template
  ansible.builtin.template:
    src: filebeat.yml.j2
    dest: "{{ item.path }}/install/filebeat-7.17.1-linux-x86_64/filebeat.yml"
    owner: root
    group: root
    mode: 0660
  with_items: "{{ _result.files }}"
  #notify: restart elastic-agent

- name: Configure Elastic Agent - Filebeat from files
  ansible.builtin.copy:
    src: "{{ item | basename }}"
    dest: "{{ item }}"
    owner: root
    group: root
    mode: 0660
  with_items:
    - "/etc/elastic-agent/server1.crt"
    - "/etc/elastic-agent/server2.crt"
    - "/etc/elastic-agent/server3.crt"
  #notify: restart elastic-agent 

- name: Start Elastic Agent
  service:
    name: elastic-agent
    state: started

As I mention before I have did this procedure several times manually (enroll and configure the agent) and always works fine. Can anyone help me on this? Can I provide something else to help troubleshooting?
Sorry the long post, best regards to all
Thanks in advance

Hi... anyone could help here? Any idea will be appreciated! Thanks to all!!

Are you deploying standalone or fleet managed elastic-agents ?

If the latter, why not configure the policies centrally ?

Hello @zx8086. Thank you a lot taking your time to read me!
We are deploying fleet managed elastic-agents.
Yes, we have a set of policies defined in Fleet, but the our main goal is to deploy multiple elastic-agent at once using Ansible. As I said before: the role to deploy the agents and enroll them to fleet server (using the policy predefined) works good, in the other hand when we run the role to configure the integration to output to ES, the file filebeat.sock get dissapeared...
No errors throw in Ansible. If you need the Ansible output, or anything else, please let me know.
Thank you, best regards

@Danny_Dumenigo So you deploy the elastic-agents and they run fine.... you are trying to configure the elastic-agent filebeat service, right ?

yes, the deployment goes fine, but, the policy has 2 integrations (system and osquery), we use certificates in our ES so mandatory I have to configure the output of system integration's filebeat/metricbeat to use those certificates, in that step is when everything gets wrecked

Can you not configure the elastic-agent and after that create the policy and integration via the Fleet API ?

Sorry, I get lost on your suggestion, Im not a very experienced user. Please explain me how to do what you suggest so Im able to recreate the scenario by my side and proceed to test it.

What I always do is:
1- to manually create and configure the policy and its integrations,
2- enroll the elastic-agent using the enrollment-token that belongs to that policy,
3- set the Elasticsearch output in filebeat.yml located under /var/lib/elastic-agent/data/elastic-agent-1d05ba/install/filebeat-7.17.1-linux-x86_64/ to use the certificates and the my ES hosts.

Those are the steps Im trying to automate. (In this case only steps 2 and 3).

Best Regards

I am wondering why you don't use Ansible and the URI Webservice module ( GET / POST / PATCH) to configure you setup.

https://petstore.swagger.io/?url=https://raw.githubusercontent.com/elastic/kibana/7.x/x-pack/plugins/fleet/common/openapi/bundled.json#/default/agent-policy-list

Yes, you are rigth, I don't use it in first place because everything in my planning was working as expected until the moment of yaml's configurations, and I was trying to recreate autmatically the steps I do manually. I will take a look at this and i'll be back, hopefully with some positive results...
Best Regards, and thank you for the tips!

@Danny_Dumenigo

After various deployment methods, including one like your approach, using Ansible with the Fleet API was the best / most effective.

This way, after the initial installation / deployment, everything else is controlled centrally without touching the host again.

Ok, I will try to configure elastic-agents integrations using this approach then. Only another doubt pop-up now: if I centrally controlled the configurations using Fleet, which deploy one config in all agents, and I have different OS (basically windows and linux) and the path treatment differs from one to another... (I mean, directory path for installation, certificates is not the same on both) how can I face this issue??
I guess I will try this one step at a time. I'll keep you inform any advance I make.
Best Regards!!

Between Ansible and Fleet Deployments, that shouldn't be an issue

ok, great! I will do some tests and back to you ASAP.
Best Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.