Fleet server does not start properly

Hi there! I'm running an ELK stack on prem version 7.14 with all the services in diferent VMs, running properly from 2 months ago and shipping the logs with filebeat, etc. Kibana/logstash uses production certs for the https but elasticsearch cluster uses autogenerated certs.

I deployed a new server in order to deploy a fleet server and move my filebeat inputs to elastic-agents but I get an error when I try to do this op.

The fleet server logs:

{"log.level":"info","service.name":"fleet-server","@timestamp":"2021-08-24T15:40:05.516Z","message":"starting communication connection back to Elastic Agent"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2021-08-24T15:40:05.516Z","message":"waiting for Elastic Agent to send initial configuration"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2021-08-24T15:40:06.03Z","message":"received initial configuration starting Fleet Server"}
{"log.level":"info","service.name":"fleet-server","cfg":{"NumCounters":500000,"MaxCost":52428800,"ActionTTL":300000000000,"ApiKeyTTL":900000000000,"EnrollKeyTTL":60000000000,"ArtifactTTL":86400000000000,"ApiKeyJitter":300000000000},"@timestamp":"2021-08-24T15:40:06.03Z","message":"makeCache"}
{"log.level":"info","service.name":"fleet-server","status":"STARTING","@timestamp":"2021-08-24T15:40:06.032Z","message":"Starting"}
{"service.name":"fleet-server","log.level":"info","log.logger":"fleet-metrics.api","message":"Starting stats endpoint","@timestamp":"2021-08-24T15:40:06.033Z"}
{"service.name":"fleet-server","log.level":"info","log.logger":"fleet-metrics.api","message":"Metrics endpoint listening on: /var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock (configured: unix:///var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock)","@timestamp":"2021-08-24T15:40:06.033Z"}
{"log.level":"info","service.name":"fleet-server","name":"VoxelInfraLogs","uuid":"e2GiZmRxTreCCF2gg_dqww","vers":"7.14.0","@timestamp":"2021-08-24T15:40:06.063Z","message":"Cluster Info"}
{"log.level":"info","service.name":"fleet-server","opts":{"flushInterval":250,"flushThresholdCnt":2048,"flushThresholdSz":1048576,"maxPending":8,"blockQueueSz":32,"apikeyMaxParallel":120},"@timestamp":"2021-08-24T15:40:06.064Z","message":"Run bulker with options"}
{"log.level":"info","service.name":"fleet-server","fleet_version":"7.14.0","elasticsearch_version":"7.14.0","@timestamp":"2021-08-24T15:40:06.065Z","message":"versions are compatible"}
{"log.level":"info","service.name":"fleet-server","name":"VoxelInfraLogs","uuid":"e2GiZmRxTreCCF2gg_dqww","vers":"7.14.0","@timestamp":"2021-08-24T15:40:06.074Z","message":"Cluster Info"}
{"log.level":"info","service.name":"fleet-server","index":".fleet-policies","ctx":"index monitor","@timestamp":"2021-08-24T15:40:06.074Z","message":"start"}
{"log.level":"info","service.name":"fleet-server","ctx":"policy agent monitor","throttle":5,"@timestamp":"2021-08-24T15:40:06.074Z","message":"run policy monitor"}
{"log.level":"info","service.name":"fleet-server","index":".fleet-actions","ctx":"index monitor","@timestamp":"2021-08-24T15:40:06.074Z","message":"start"}
{"log.level":"info","service.name":"fleet-server","limits":{"Interval":1000000,"Burst":1000,"Max":0,"MaxBody":1048576},"long_poll_timeout":300000,"long_poll_timestamp":30000,"long_poll_jitter":30000,"@timestamp":"2021-08-24T15:40:06.074Z","message":"Checkin install limits"}
{"log.level":"info","service.name":"fleet-server","limits":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":524288},"@timestamp":"2021-08-24T15:40:06.074Z","message":"Enroller install limits"}
{"log.level":"info","service.name":"fleet-server","limits":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0},"maxParallel":8,"@timestamp":"2021-08-24T15:40:06.074Z","message":"Artifact install limits"}
{"log.level":"info","service.name":"fleet-server","limits":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":2097152},"@timestamp":"2021-08-24T15:40:06.074Z","message":"Ack install limits"}
{"log.level":"info","service.name":"fleet-server","method":"GET","path":"/api/status","@timestamp":"2021-08-24T15:40:06.074Z","message":"Server install route"}
{"log.level":"info","service.name":"fleet-server","method":"POST","path":"/api/fleet/agents/:id","@timestamp":"2021-08-24T15:40:06.074Z","message":"Server install route"}
{"log.level":"info","service.name":"fleet-server","method":"POST","path":"/api/fleet/agents/:id/checkin","@timestamp":"2021-08-24T15:40:06.074Z","message":"Server install route"}
{"log.level":"info","service.name":"fleet-server","method":"POST","path":"/api/fleet/agents/:id/acks","@timestamp":"2021-08-24T15:40:06.074Z","message":"Server install route"}
{"log.level":"info","service.name":"fleet-server","method":"GET","path":"/api/fleet/artifacts/:id/:sha2","@timestamp":"2021-08-24T15:40:06.074Z","message":"Server install route"}
{"log.level":"info","service.name":"fleet-server","bind":"0.0.0.0:8220","rdTimeout":60000,"wrTimeout":600000,"@timestamp":"2021-08-24T15:40:06.074Z","message":"server listening"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2021-08-24T15:40:06.075Z","message":"server hard connection limiter disabled"}
{"log.level":"warn","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2021-08-24T15:40:06.075Z","message":"missing config fleet.agent.id; acceptable until Elastic Agent has enrolled"}
{"log.level":"error","service.name":"fleet-server","index":".fleet-policies","ctx":"index monitor","error.message":"elastic fail 500:exception:[service account role descriptor resolving] requires TLS for the HTTP interface","@timestamp":"2021-08-24T15:40:06.082Z","message":"failed to initialize the global checkpoints"}
{"log.level":"info","service.name":"fleet-server","index":".fleet-policies","ctx":"index monitor","error.message":"elastic fail 500:exception:[service account role descriptor resolving] requires TLS for the HTTP interface","@timestamp":"2021-08-24T15:40:06.082Z","message":"exited"}
{"log.level":"info","service.name":"fleet-server","ctx":"policy agent monitor","@timestamp":"2021-08-24T15:42:04.575Z","message":"Exit policy monitor local"}
{"log.level":"info","service.name":"fleet-server","status":"STOPPING","@timestamp":"2021-08-24T15:42:04.575Z","message":"Stopping"}
{"log.level":"error","service.name":"fleet-server","error.message":"elastic fail 500:exception:[service account role descriptor resolving] requires TLS for the HTTP interface","@timestamp":"2021-08-24T15:42:04.575Z","message":"Policy index monitor exited"}
{"log.level":"info","service.name":"fleet-server","index":".fleet-actions","ctx":"index monitor","@timestamp":"2021-08-24T15:42:04.575Z","message":"context closed waiting for global checkpoints advance"}
{"log.level":"info","service.name":"fleet-server","index":".fleet-actions","ctx":"index monitor","@timestamp":"2021-08-24T15:42:04.575Z","message":"exited"}
{"service.name":"fleet-server","log.level":"info","log.logger":"fleet-metrics.api","message":"Stats endpoint (/var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock) finished: accept unix /var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock: use of closed network connection","@timestamp":"2021-08-24T15:42:04.575Z"}
{"log.level":"info","service.name":"fleet-server","error.message":"elastic fail 500:exception:[service account role descriptor resolving] requires TLS for the HTTP interface","@timestamp":"2021-08-24T15:42:04.575Z","message":"Fleet Server exited"}
{"log.level":"info","service.name":"fleet-server","status":"STARTING","@timestamp":"2021-08-24T15:42:04.575Z","message":"Starting"}
{"log.level":"info","service.name":"fleet-server","status":"STOPPING","@timestamp":"2021-08-24T15:42:04.575Z","message":"Stopping"}
{"service.name":"fleet-server","log.level":"info","log.logger":"fleet-metrics.api","message":"Starting stats endpoint","@timestamp":"2021-08-24T15:42:04.575Z"}
{"service.name":"fleet-server","log.logger":"fleet-metrics.api","message":"Metrics endpoint listening on: /var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock (configured: unix:///var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock)","log.level":"info","@timestamp":"2021-08-24T15:42:04.576Z"}
{"log.level":"info","service.name":"fleet-server","name":"VoxelInfraLogs","uuid":"e2GiZmRxTreCCF2gg_dqww","vers":"7.14.0","@timestamp":"2021-08-24T15:42:04.585Z","message":"Cluster Info"}
{"log.level":"error","service.name":"fleet-server","error.message":"context canceled","@timestamp":"2021-08-24T15:42:04.585Z","message":"failed to fetch elasticsearch version"}
{"log.level":"info","service.name":"fleet-server","opts":{"flushInterval":250,"flushThresholdCnt":2048,"flushThresholdSz":1048576,"maxPending":8,"blockQueueSz":32,"apikeyMaxParallel":120},"@timestamp":"2021-08-24T15:42:04.585Z","message":"Run bulker with options"}
{"service.name":"fleet-server","message":"Stats endpoint (/var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock) finished: accept unix /var/lib/elastic-agent/data/tmp/default/fleet-server/fleet-server.sock: use of closed network connection","log.level":"info","log.logger":"fleet-metrics.api","@timestamp":"2021-08-24T15:42:04.585Z"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2021-08-24T15:42:04.585Z","message":"Fleet Server exited"}

I execute the deploy of the fleet server with this command:
sudo elastic-agent enroll -d debug -f --fleet-server-es=https://<elastic_ip>:9200 --fleet-server-service-token=<proper_token> --fleet-server-policy=id:5fa328c0-f635-11eb-b62c-2d3279f943d3 --fleet-server-es-ca=/etc/elastic-agent/certs/elasticsearch-ca.pem

Here the elastic-agent logs:

[...]
{"log.level":"info","@timestamp":"2021-08-24T15:40:05.115Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is rEMABmG_","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:40:05.115Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 1 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:40:05.158Z","log.origin":{"file.name":"operation/operator.go","file.line":259},"message":"operation 'operation-install' skipped for fleet-server.7.14.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:40:05.493Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2021-08-24T17:40:05+02:00 - message: Application: fleet-server--7.14.0[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:40:05.494Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":66},"message":"Updating internal state","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:42:04.574Z","log.origin":{"file.name":"cmd/run.go","file.line":189},"message":"Shutting down Elastic Agent and sending last events...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:42:04.574Z","log.origin":{"file.name":"operation/operator.go","file.line":191},"message":"waiting for installer of pipeline 'default' to finish","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:42:04.574Z","log.origin":{"file.name":"process/app.go","file.line":181},"message":"Signaling application to stop because of shutdown: fleet-server--7.14.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:42:05.075Z","log.origin":{"file.name":"cmd/run.go","file.line":197},"message":"Shutting down completed.","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:42:05.075Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2021-08-24T17:42:05+02:00 - message: Application: fleet-server--7.14.0[]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-08-24T15:42:05.075Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":66},"message":"Stats endpoint (/var/lib/elastic-agent/data/tmp/elastic-agent.sock) finished: accept unix /var/lib/elastic-agent/data/tmp/elastic-agent.sock: use of closed network connection","ecs.version":"1.6.0"}

Could anyone give me some light?

Thanks in advance,

V

I'm stumbling over the following log line:

{"log.level":"info","service.name":"fleet-server","index":".fleet-policies","ctx":"index monitor","error.message":"elastic fail 500:exception:[service account role descriptor resolving] requires TLS for the HTTP interface","@timestamp":"2021-08-24T15:40:06.082Z","message":"exited"}

But I also see you are connecting through https to Elasticsearch is this correct? What is your TLS config in Elasticsearch?

@ruflin Could be possible that the HTTPS is managed by a HTTP reverse proxy and not by Elasticsearch and that is why that error is showing. Just a thought.

Hi all,

@blaker There is no reverse proxy in that installation.

@ruflin I use the autogenerated certificates way. Here the configuration with the autogenerated certs of the master node:

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: full
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/AVVLS-ELAST-001.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: "/etc/elasticsearch/certs/http.p12"

The installation is in a rocky linux. I tried a clean installation with the same installation procedure and works properly. Maybe is an update issue.

I did a new installation and migrate the data. We can close this case. The workaround was hard.

Sorry for the bumpy road @Servando_Dominguez The reinstall with all the same settings just worked?