But none of my normal agents can connect. They have a connection refused message in their status. And I see logs like this in my normal agents logs:
{"log.level":"error","@timestamp":"2023-03-23T11:43:44.771-0700","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":194},"message":"Cannot checkin in with fleet-server, retrying","log":{"source":"elastic-agent"},"error":{"message":"fail to checkin to fleet-server: all hosts failed: 2 errors occurred:\n\t* requester 0/2 to host https://< new broken fleet >.example.org:8220/ errored: Post \"https://< new broken fleet >.example.org:8220/api/fleet/agents/bfeccd8d-6289-4c9c-89a9-cf66edbadb3b/checkin?\": dial tcp < internal ip >:8220: connect: connection refused\n\t* requester 1/2 to host https://ws-prod-sql-01.example.org:8220/ errored: Post \"https://< turned off working fleet >.example.org:8220/api/fleet/agents/bfeccd8d-6289-4c9c-89a9-cf66edbadb3b/checkin?\": dial tcp < turned off working fleet internal ip >:8220: connect: connection refused\n\n"},"request_duration_ns":4164942,"failed_checkins":3,"retry_after_ns":310915958531,"ecs.version":"1.6.0"}
I've confirmed the firewall has 8220 open. But nmap from a normal host to the fleet host says the port is closed. (not filtered, so the firewall is open.) On the vm with the fleet server agent, netstat shows something listening on localhost:8220, but nothing on the actual vm ip. That's with fleet server configured to use "0.0.0.0". So I'd think it would be listening on all the NICs assigned to the host vm.
This is all on 8.6.2 for ES/Kibana/Agent. The vm's are running Ubuntu 22.04. ES is installed via apt repos.
So, oddly, on the server with a working fleet-server integration, it seems to be listening on ipv6. Which I am not exactly sure why it is working when we don't have ipv6 configured on our network...
I also tried setting the Host field in the integration settings to the specific ip address of the nic I want fleet-server on. It did not make any difference.
Just to double check, the 127.0.0.1:8220 column value means that fleet-server is listening for connections on localhost, right? And the 0.0.0.0:* column value means it will accept connections from any address and port, right?
So what I want to see is 0.0.0.0:8220 in the column that currently has 127.0.0.1:8220 . Correct?
I pulled down the output of elastic-agent diagnostics. Diffing the components.yaml, computed-config.yaml, pre-config.yaml, state.yaml, and variables.yaml files show no differences that didn't boil down to different hosts or the order things were output in.
I also tried enabling ipv6 on the non-working vm. Didn't help.
And because I'm using insecure mode, Agent defaults to binding only on localhost. Which you need to override with the --fleet-server-host flag on install.
I never found any documentation stating that you need to use the --fleet-server-host flag when installing insecurely. A nice person on Slack happened to know what was going on and clued me in.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.