Fail to checkin to fleet-server

Hi All,

I have successfully enrolled my remote server/machine into my Fleet server and I can see my metrics and logs coming thru.

The issue is that at the beginning of the enrollment the status of the agent in kibana was Updating then it turned to Offline without ever being Online till now.

The fleet server is Online:

Here are the results of elastic agent status:


elastic-agent status
State: HEALTHY
Message: Running
Fleet State: FAILED
Fleet Message: fail to checkin to fleet-server: all hosts failed: 1 error occurred:
	* requester 0/1 to host https://localhost:8221/ errored: Post "https://localhost:8221/api/fleet/agents/f82222be-ec5e-49e2-a584-4f9c74bcf610/checkin?": dial tcp [::1]:8221: connect: network is unreachable


Components:
  * filestream      (HEALTHY)
                    Healthy: communicating with pid '795'
  * log             (HEALTHY)
                    Healthy: communicating with pid '775'
  * system/metrics  (HEALTHY)
                    Healthy: communicating with pid '780'
  * beat/metrics    (HEALTHY)
                    Healthy: communicating with pid '785'
  * http/metrics    (HEALTHY)
                    Healthy: communicating with pid '786'

The error showed in the elastic agent log file opt/Elastic/Agent/data/elastic-agent-10dc6a/logs/elastic-agent-20230524.ndjson is this:


{"log.level":"error","@timestamp":"2023-05-24T09:34:18.986Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":197},"message":"Cannot checkin in with fleet-server, retrying","log":{"source":"elastic-agent"},"error":{"message":"fail to checkin to fleet-server: all hosts failed: 1 error occurred:\n\t* requester 0/1 to host https://localhost:8221/ errored: Post \"https://localhost:8221/api/fleet/agents/f82222be-ec5e-49e2-a584-..../checkin?\": dial tcp [::1]:8221: connect: network is unreachable\n\n"},"request_duration_ns":237691,"failed_checkins":100,"retry_after_ns"::553251097583,"ecs.version":"1.6.0"}

Why is the agent is trying to connect to port 8221 ? In the enrollment it was 8220 !

I've passed by this thread but couldn't understand or find a solution.

Your help is much appreciated.

Dears @ stephenb @ axw @ leandrojmp

I can see that you are the champions of the Elastic Agent support forum, actually I searched a lot in this issue with no one ever mentioned about this before.

Can you please take a look into this?

Thanks in advance and sorry for any annoyance. :pray:

Hi @ethical20,

Could you provide the command used for the enrollment and a screenshot of the Fleet > Settings tab?

Thanks,
Cristina

Hi @Cristina_Amico

Here is the command used for enrollment:

./elastic-agent install --url=https://IP:5045 --fleet-server-es=https://IP:5050 --fleet-server-service-token=AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuLTE2ODQ4NTUwODY5NjM6ZDMtd3VybGJTNEdXWE1.... --enrollment-token=QktyWlVvZ0I0TC16Q05jdFU0ZzM6d2VaYUtLbndTeDZpeWhTS....== --certificate-authorities=/etc/ssl/certs/elasticsearch-ca.pem

And here is the Fleet Setting Page:

Thanks in advance

In the provided command to enroll the agent I see that there is an option --fleet-server-service-token that shouldn't be there, it should be used only when enrolling a fleet server.

./elastic-agent install \
  --url=https://IP:5045 \
  --fleet-server-es=https://IP:5050 \
  --fleet-server-service-token=*** \
  --enrollment-token=*** \
  --certificate-authorities=/etc/ssl/certs/elasticsearch-ca.pem

There is a reference of the available commands at this linkhere. Could you try enrolling the agent without this flag and see how it goes?

Thanks,
Cristina

Thanks @Cristina_Amico

I've first tried your solution and removed the --fleet-server-service-token and I got this error:

{"log.level":"info","@timestamp":"2023-05-26T11:15:34.180Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":407},"message":"Generating self-signed certificate for Fleet Server","ecs.version":"1.6.0"}
Error: invalid connection string: must include a service token
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.7/fleet-troubleshooting.html
Error: enroll command failed with exit code: 1

I then removed the --fleet-server-es

./elastic-agent install --url=https://IP:5045 --enrollment-token=*** --certificate-authorities=/etc/ssl/certs/elasticsearch-ca.pem
Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:y
{"log.level":"info","@timestamp":"2023-05-26T11:16:40.287Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":475},"message":"Starting enrollment to URL: https://IP:5045/","ecs.version":"1.6.0"}
Error: fail to enroll: fail to execute request to fleet-server: dial tcp IP:5045: connect: connection refused

I've checked the server having the connection refused and it has the following message:

connect_to 192.68.0.7 port 8220: failed

I've checked my fleet server and I can see following port configs:

kibana@kibana:~$ netstat -tulpn
kibana@kibana:~$ netstat -tulpn
(Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:8221          0.0.0.0:*               LISTEN      708/fleet-server
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      669/sshd: /usr/sbin
tcp        0      0 192.168.0.7:5601        0.0.0.0:*               LISTEN      611/node
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      590/systemd-resolve
tcp        0      0 127.0.0.1:6791          0.0.0.0:*               LISTEN      607/elastic-agent
tcp        0      0 127.0.0.1:6789          0.0.0.0:*               LISTEN      607/elastic-agent
tcp6       0      0 :::22                   :::*                    LISTEN      669/sshd: /usr/sbin
tcp6       0      0 :::8220                 :::*                    LISTEN      708/fleet-server
udp        0      0 127.0.0.53:53           0.0.0.0:*                           590/systemd-resolve
udp6       0      0 fe80::5c6b:3cff:... :::*                                588/systemd-network



kibana@kibana:~$ sudo netstat -pln | grep 8220
tcp6       0      0 :::8220                 :::*                    LISTEN      708/fleet-server

Does this mean Fleet server is listening on 8220 only for IPV6? Although Im only using IPV4.

Any clue in making fleet server listen to 8220 ?

Thanks

Are you intending to run a local fleet-server process on the agent?

From the context in this thread I don't think you are, so the --fleet-server-* flags should be removed from the install command (as you did).

In your comment, you try to enroll to IP:5045, is that where fleet-server is running?
Can you run curl -v IP:5045/api/status to verify if the server is running?

How did you bootstrap the fleet-server?
One stumbling block we have is that in order to run on a non-default port (5045 instead of 8220) you need to pass the --fleet-server-port=5045 flag on installation (the --url option is not used to specify the port on fleet-server installation)

Finally, what is the fleet-server host in Kibana?

Thanks @MichelLaterman

Please be patient about the below detailed scenario.

The kibana host you see in the below image is where I've already installed Kibana service on this local machine https://192.168.0.7:5601 plus I've installed a Fleet server on the same machine which should be listening on 192.168.0.7:8220

Also I have another Fleet server which is remote (not local)

Both fleet servers now are Online and Healthy .

Enrolling a local machine 192.168.0.101 to Local Fleet server 192.168.0.7 at port 8220 succussed.

My Issue now is with agent enrollment for a remote machine on remote fleet server.

To get out of the non default ports issue 5045 that might mislead us, my infrastructure now has the below 2 Fleet servers, (local and remote.) both on 8220

Trying to enroll a remote machine to remote fleet server gave the below error:

~/elastic-agent-8.8.0-linux-x86_64# ./elastic-agent install --url=https://IP:8220 --enrollment-token=*** --certificate-authorities=/etc/ssl/certs/elasticsearch-ca.pem
Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:y
{"log.level":"info","@timestamp":"2023-06-01T08:25:11.177Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":478},"message":"Starting enrollment to URL: https://IP:8220/","ecs.version":"1.6.0"}
Error: fail to enroll: fail to execute request to fleet-server: dial tcp IP:8220: connect: connection timed out
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.8/fleet-troubleshooting.html

curl -v IP:8200/api/status result is :

curl -v https://IP:8200/api/status
*   Trying IP:8200...
* connect to IP port 8200 failed: Connection timed out
* Failed to connect to IP port 8200 after 129499 ms: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to IP port 8200 after 129499 ms: Connection timed out

Surprisingly curl -v https://192.16.0.7:8200/api/status which is online and Healthy and I can enroll agent successfully gave this

*   Trying 192.16.0.7:8200...
* connect to 192.16.0.7 port 8200 failed: Connection timed out
* Failed to connect to 192.16.0.7 port 8200 after 129895 ms: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to 192.16.0.7 port 8200 after 129895 ms: Connection timed out

The only way to get a machine enrolled to the Fleet is by installing it as a Fleet server using this command:

./elastic-agent install --url=https://IP:8220 \
  --fleet-server-es=https://IP:5050 \
  --fleet-server-service-token=AAEAAWV*** \
  --fleet-server-policy=fleet-server-policy \
  --certificate-authorities=/root/certs/elasticsearch-ca.pem \
  --fleet-server-es-ca=/root/certs/elasticsearch-ca.pem \
  --fleet-server-cert=/root/certs/fleet-server.crt \
  --fleet-server-cert-key=/root/certs/fleet-server.key

This is the only way I can enroll agents in Fleet, but they will have the role of a Fleet server instead of an agent.

Any help is much appreciated.

Are you using the argument --url=https://IP:8220 verbatim, or --url=https://192.168.0.7:8220