i was using elastic 8.8.0 and all my agents were online green. Updating the stack went smooth but when came to upgrade the elastic agents from fleet to 8.9.1 i can see that the agents are offline but still getting data.
Updating to 8.11.1 also didn't solve the issue, and still get this error:
~/elastic-agent-8.11.0-linux-x86_64# ./elastic-agent install --url=https://X.X.X.X:8220 --enrollment-token=QktyWlVvZ0I0TC16Q05jdFU0ZzM6d2VaYUtLbndTeDZpeWhTS2tRMz...== --certificate-authorities=/etc/ssl/certs/elasticsearch-ca.pem
Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:y
Copying files.................................................................................................................................................................................................................................................................................................................................................................................. DONE
Installing service..... DONE
Starting service... DONE
Enrolling Elastic Agent with Fleet......{"log.level":"info","@timestamp":"2023-11-14T04:31:13.414-0800","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":479},"message":"Starting enrollment to URL: https://X.X.X.X:8220/","ecs.version":"1.6.0"}
..Error: fail to enroll: fail to execute request to fleet-server: x509: cannot validate certificate for X.X.X.X because it doesn't contain any IP SANs
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html
FAILED
Stopping service....... DONE
Uninstalling...
Stopping service... DONE
Stopping upgrade watcher; none found... DONE
. Removing service.... DONE
Removing install directory... DONE
DONE
Error: enroll command failed for unknown reason: exit status 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html
P.S: The agents with NON-updated versions are still online-green
In the offline running agents I can see this log:
13:41:06.958
elastic_agent
[elastic_agent][error] Cannot checkin in with fleet-server, retrying
13:46:45.519
elastic_agent
[elastic_agent][error] ack retrier: commit failed with error: acknowledge 1 actions '[action_id: policy:5c65d5f0-faf6-11ed-ad81-bfb90094819d:20:1, type: POLICY_CHANGE]' for elastic-agent '5a50aea4-334f-4e83-ac35-32e00a6ed204' failed: fail to ack to fleet: all hosts failed: 1 error occurred:
* requester 0/1 to host https://X.X.X.X:8220/ errored: Post "https://X.X.X.X:8220/api/fleet/agents/5a50aea4-334f-4e83-ac35-32e00a6ed204/acks?": x509: cannot validate certificate for X.X.X.X because it doesn't contain any IP SANs
Please note that I didn't mess up with the certs, only just apt-upgrade for elasticsearch and kibana
openssl x509 -noout -text -in /etc/ssl/certs/elasticsearch-ca.pem
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
59:8b:77:36:dd:2c:9d:a4:0a:2c:d8:1a:6b:6b:d2:d0:70:3e:70:X
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = Elastic Certificate Tool Autogenerated CA
Validity
Not Before: May 23 08:23:42 2023 GMT
Not After : May 22 08:23:42 2026 GMT
Subject: CN = Elastic Certificate Tool Autogenerated CA
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:c6:72:b4:e4:c9:b6:18:b7:fd:d3:e6:4a:59:d6:
26:13:4b:f0:75:d6:d9:9f:6c:1a:5b:3d:e9:17:cd:
d6:1b:9f:af:83:b4:83:93:d1:c5:ef:26:03:7f:db:
f8:ea:a7:d1:1c:e4:6b:e7:da:ac:56:bf:4c:07:72:
4b:c5:47:13:30:b8:93:a7:ff:f8:e9:22:3c:3b:0c:
8f:b2:a8:4b:98:dd:95:7d:e1:c1:19:15:4f:2a:70:
09:a8:17:82:0a:3e:13:5a:9f:50:dd:fc:ff:0a:9a:
b8:a3:78:dd:9b:07:58:9d:5d:b3:46:0e:06:f7:cd:
03:e3:ab:88:0e:ee:8e:26:49:65:34:51:26:b6:b4:
81:2a:5b:c5:1d:18:43:ce:cf:fb:db:4c:33:7b:06:
8e:b5:8c:0a:0e:ca:7c:c1:72:a7:6f:93:91:fb:3e:
80:13:40:ed:8f:e1:7d:76:cd:ca:74:9c:56:13:d7:
c3:ce:30:8f:be:59:68:77:f7:95:45:b6:34:85:dd:
ed:94:1b:73:df:36:59:31:e6:d0:af:76:e6:02:2f:
55:22:2b:df:94:97:07:c6:ca:7b:3d:9d:26:55:50:
dc:e6:b7:17:d3:02:05:d8:38:53:46:14:f5:9e:2a:
X
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Subject Key Identifier:
EA:6E:82:1F:A4:D4:48:28:0C:E0:A5:E0:66:CA:02:9D:88:1E:X:X
X509v3 Authority Key Identifier:
EA:6E:82:1F:A4:D4:48:28:0C:E0:A5:E0:66:CA:02:9D:88:1E:X:X
X509v3 Basic Constraints: critical
CA:TRUE
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
81:a1:00:cf:fd:b2:03:c2:6a:1f:f9:5b:b8:c2:93:05:9f:02:
eb:45:8c:6d:47:49:a0:0d:ca:ee:be:b4:e4:9f:6a:e8:c8:77:
XX
Please note that this same cert used to be working fine for months on all agents only got his issue after upgrading. Moreover the non upgraded agent is still online with the same cert while the updated one is not!
Yes I can I confirm that this was OK on version 8.9.1 and the issue appears on version 8.9.2 and up. (An agent with 8.9.1 still online)
But what happened exactly for this not to work! also I think I've already included the IP of the Fleet server as a SAN in the cert, else it wouldn't work on the 8.9.1 right?
How can I check if the cert has the fleet server IP included as a SAN in the cert?
I mean which cert do I have to check? Is it the elasticsearch-ca.pem used in agent installation here?
I think your initial error message indicates that the fleet-server's cert is missing an IP SAN.
You will need to check/change fleet-server's certificate (should be on the host where fleet-server is installed on port 8220), above you were checking elasticsearch's cert
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.