I am now getting Endpoint Security data in data streams. The solution to that problem was to fix a type in the ssl.certificate_authorities setting in Fleet Settings.
I am still experiencing the problem where no data is coming through from the auditd, system, and linux integrations.
Reiterating my scenario...
I'm using Elastic 7.17, self managed.
I've set up three elasticsearch nodes on RHEL 8 in AWS EC2. Each of these has additionally been set up as a Fleet Server.
I've set up a fourth EC2 RHEL8 host for kibana.
All four hosts have certificates signed by a certificate authority we set up using AWS Certificate Management.
We are using an AWS NLB for managing traffic, so the Fleet Settings are:
Fleet Server Hosts: https://:8220
Elasticsearch Hosts: https://:9200
On the NLB we have set up listeners for the two ports above. Each one is forwarding to a target group that is comprised of the three Elasticsearch nodes.
On the Elasticsearch EC2 instances a security group has been assigned with inbound rules for ports 8220, 9200, and 9300 all allowing TCP traffic from the VPC CIDR.
On the kibana EC2 instance a security group has been assigned with inbound rules for ports 5601 and 443 allowing https traffic from our application load balancer.
On the kibana instance in the Agent/data/elastic-agent-*/logs/default/filebeat-json.log file I see the following messages repeating:
{"log.level":"error","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": dial tcp [::1]:9200: connect: connection refused","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 94 reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":" done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"esclientleg","log.origin":{"file.name":"transport/logging.go","file.line":37},"message":"Error dialing dial tcp [::1]:9200: connect: connection refused","service.name":"filebeat","network":"tcp","address":"localhost:9200","ecs.version":"1.6.0"}
On the elasticsearch nodes in the same file I see these messages repeating:
{"log.level":"error","@timestamp":"2022-03-15T17:11:58.650Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": EOF","service.name":"filebeat","ecs.version":"1.6.0"}{"log.level":"info","@timestamp":"2022-03-15T17:11:58.650Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 113reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
Metricbeat and Filebeat are in a perpetual state of "configuring". I've never seen this change to "healthy"
elastic-agent status
Status: HEALTHY
Message: (no message)
Applications:
* metricbeat_monitoring (CONFIGURING)
Updating configuration
* endpoint-security (HEALTHY)
Protecting with policy {bd328999-4957-44fd-9e57-75aad67d7302}
* filebeat (CONFIGURING)
Updating configuration
* fleet-server (HEALTHY)
Running on policy with Fleet Server integration: 499b5aa7-d214-5b5d-838b-3cd76469844e
* metricbeat (CONFIGURING)
Updating configuration
* filebeat_monitoring (CONFIGURING)
Updating configuration
This is the fleet.yml file on one of the elasticsearch nodes:
agent:
id: f2f35a6f-8bbb-4a74-8d49-97424926516b
headers: {}
logging.level: info
monitoring.http:
enabled: false
host: ""
port: 6791
fleet:
access_api_key: <key>
agent:
id: ""
enabled: true
host: <nlb dns name from AWS>:8220
protocol: https
proxy_disable: true
reporting:
check_frequency_sec: 30
threshold: 10000
server:
host: 0.0.0.0
internal_port: 8221
output:
elasticsearch:
hosts:
- localhost:9200
protocol: https
proxy_disable: false
proxy_headers: null
service_token: <token>
ssl:
certificate_authorities:
- /etc/elasticsearch/certs/chain_cert.crt
renegotiation: never
verification_mode: ""
policy:
id: 499b5aa7-d214-5b5d-838b-3cd76469844e
port: 8220
ssl:
certificate: /etc/elasticsearch/certs/<name>.crt
key: /etc/elasticsearch/certs/<name>.key
renegotiation: never
verification_mode: ""
ssl:
certificate_authorities:
- /etc/elasticsearch/certs/chain_cert.crt
renegotiation: never
verification_mode: ""
timeout: 10m0s
And this is the fleet.yml from the kibana host:
agent:
id: 80f20b0c-aa72-401a-a034-1bb4ca2400f7
headers: {}
logging.level: info
monitoring.http:
enabled: false
host: ""
port: 6791
fleet:
access_api_key: <key>
agent:
id: ""
enabled: true
host: <nlb dns name from AWS>:8220
hosts:
- https://<nlb dns name from AWS>:8220
protocol: http
reporting:
check_frequency_sec: 30
threshold: 10000
ssl:
certificate_authorities:
- /etc/kibana/certs/chain_cert.crt
renegotiation: never
verification_mode: none
timeout: 10m0s
I believe the same problem has been posted here, though that person is working with a Windows host: