Fleet Server status stuck at updating, but seems to be working normally?!

Hello there.

I recently updated a client system from 8.4.3 to 8.5.0. Everything was looking good, after the typical reindexing of shards until they were green or at least yellow.
I then went to upgrade the Fleet Server agent so I could get all the other agents upgraded.

For some reason it shows as Updating still, but when I query it's status locally it says heatly and all components are running. If I update the policy it also updates on the fleet server after a while, so how can I get the Kibana interface to acknowledge this?

root@480velastic1:/opt/Elastic/Agent/data/elastic-agent-9da6ba# ./elastic-agent status
Status: HEALTHY
Message: (no message)
Applications:
  * filebeat               (HEALTHY)
                           Running
  * fleet-server           (HEALTHY)
                           Running on policy with Fleet Server integration: 499b5aa7-d214-5b5d-838b-3cd76469844e
  * metricbeat             (HEALTHY)
                           Running
  * filebeat_monitoring    (HEALTHY)
                           Running
  * metricbeat_monitoring  (HEALTHY)
                           Running

root@480velastic1:/opt/Elastic/Agent/data/elastic-agent-9da6ba# ./elastic-agent version
Binary: 8.5.0 (build: 9da6ba5fce5d6b4d2c473c1f5ff6056794e9a644 at 2022-10-24 20:21:40 +0000 UTC)
Daemon: 8.5.0 (build: 9da6ba5fce5d6b4d2c473c1f5ff6056794e9a644 at 2022-10-24 20:21:40 +0000 UTC)

I was able to update all the other agents in spite of the fleet server showing updating, so it seems like mostly a visual 'bug', but would really like to get it back to showing healthy

Hi Martin,

Unfortunately, after upgrading, some agents can get stuck in "updating" state in Kibana, even if the actual underlying state is "healthy".

To un-mark any Elastic Agent as upgrading, you can follow the below commands as superuser.

Create a new service token:

curl -X POST --user elastic:${SUPERUSER_PASS} -H 'x-elastic-product-origin:fleet' -H' content-type:application/json' "https://${ELASTICSEARCH_HOST}/_security/service/elastic/fleet-server/credential/token/fix-agents"

Next, do an update_by_query . The query finds all Elastic Agents which are marked updating and marks the update as complete. With the created token replace the ${TOKEN} below with that created token so the query can complete successfully:

curl -XPOST -H' Authorization: Bearer ${TOKEN}' -H' x-elastic-product-origin:fleet' -H'content-type:application/json' "https://${ELASTICSEARCH_HOST}/.fleet-agents/_update_by_query" -d '{"query": {"bool": {"must": [{ "exists": { "field": "upgrade_started_at" } }],"must_not": [{ "exists": { "field": "upgraded_at" } }]}},"script": {"source": "ctx._source.upgraded_at = ctx._source.upgrade_started_at; ctx._source.upgrade_started_at = null;","lang": "painless"}}'

Finally, clean up the created service token:

curl -XDELETE --user elastic:${SUPERUSER_PASS} -H 'x-elastic-product-origin:fleet' -H'content-type:application/json' "https://${ELASTICSEARCH_HOST}/_security/service/elastic/fleet-server/credential/token/fix-agents"

Note that this procedure will apply with a fleet servers as well, as they are a special case of agent.

1 Like

Nice! I'll try this tomorrow :wink:

Well, I seem to get stuck on the second command. I have tried to tweak it here and there, but I either get this below or a message that unauthenticated access is not allowed:

curl -XPOST -H" Authorization: Bearer ${TOKEN}" -H' x-elastic-product-origin:fleet' -H'content-type:application/json' "https://${ELASTICSEARCH_HOST}/.fleet-agents/_update_by_query" -d '{"query": {"bool": {"must": [{ "exists": { "field": "upgrade_started_at" } }],"must_not": [{ "exists": { "field": "upgraded_at" } }]}},"script": {"source": "ctx._source.upgraded_at = ctx._source.upgrade_started_at; ctx._source.upgrade_started_at = null;","lang": "painless"}}'
{"error":{"root_cause":[{"type":"media_type_header_exception","reason":"Invalid media-type value on headers [Accept]"}],"type":"media_type_header_exception","reason":"Invalid media-type value on headers [Accept]","caused_by":{"type":"illegal_argument_exception","reason":"invalid media-type [*/* Authorization: Bearer <value of my token> x-elastic-product-origin:fleet]"}},"status":400}

If I remove the space before x-elastic-product-origin:fleet, then only the Authorization: Bearer part is shown, but the error is the same. I have set ELASTICSEARCH_HOST to my hostname:port

Any ideas?

Hi Martin,

I re-read your question and I might have misunderstood your problem, sorry about that.
I thought that you had agents in updating state, but I'm now realising that instead it's the Fleet Server state that it's stuck in "updating" state in the UI. Is it correct?

Of course in this case disregards my previous response, since it would unenroll your existing agents.
Would you mind to share some screenshots of what you see?

Thanks,
Cristina

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.