Orphaned agent is healthy in 9.0.1

syk · May 22, 2025, 1:39pm

Self hosted installation of ELK 9.0.1 with ~30 agents running:

On a Windows Server 2022 we deployed an elastic-agent which reports that everything is fine (elastic-agent status: fleet connected and agent healthy).

In Kibana Fleet shows this agent as "orphaned" with last checkin message: Running.
Logs of this agent show these 2 "errors":

*[elastic_agent][error] 2025-05-22 11:52:18: info: InstallLib.cpp:668 Installed endpoint is expected version (version: 9.0.1, compiled: Tue Apr 29 21:00:00 2025, branch: HEAD, commit: 36be778dc95d8f92217aed26425759e415111a22)*

*[elastic_agent][error] 2025-05-22 11:52:18: info: Util.cpp:2244 Endpoint Service is running.*

Should I report this directly on github or is there a known issue/workaround for that?

NickFritts · May 22, 2025, 2:34pm

Those two errors are actually related to endpoint and not to the problem that you're experiencing. But I'm not sure we should be reporting those two lines as an error so I'm going to look at that.

I'll see if I can find someone who knows more about the orphaned status though to respond.

edit: I responded too quickly. Those lines endpoint logged as info correctly, agent reported them as error.

lesio · May 22, 2025, 3:07pm

Thanks for letting us know. We've just recently run into similar issue on one test setup, where it happened after stack upgrade.

Was your stack recently upgraded to 9.0?

A little explanation

The Orphaned comes from audit written by "orphaned" Endpoint. The stack communicates with Elastic Endpoint via Elastic Agent. If Agent stops working Endpoint sends orphaned audit to clearly differentiate between Offline state, as otherwise such Agent would appear just offline.

We will continue to look for the root cause internally.

In the meantime I'd recommend you to check Endpoint service status

If all appears fine, on Agent and Endpoint services side, then it's only issue with resetting the audit, which we suspect it's the case.

syk · May 23, 2025, 7:30am

Thanks for your input and explanations - yes, this stack was upgraded from 8.18.1 to 9.0.1.

For the Endpoint service status (everything appears fine..):

Output of elastic-endpoint.exe status (json output can be provided if needed):

- elastic-agent
  - status: (HEALTHY) Connected
- elastic-endpoint
  - status: (HEALTHY) Running

Screenshot Agent in Kibana->Fleet:

elastic-endpoint test output reports all 3 connections with "Success"

Which option would you recommend us to do:

move the agent to a temporary policy without endpoint (and then back)?
shall we re-install the Agent?
is there a way to "reset the audit" for Endoint ourselves?
wait for the devs to figure it out & wait for a new version to be available?

lesio · May 27, 2025, 7:39pm

It's not very convenient to fix the state. Do you have many endpoints/agents affected?

You can reset the audit for the affected Agent, but it requires document update. The agent audit document on .fleet-agents index contains unenrolled reason/time which is causing the issue. However to delete the nodes a document update has to be made, as you know Elasticsearch doesn't have a query syntax to just delete/alter node of a document.

          "audit_unenrolled_reason": "orphaned",
          "audit_unenrolled_time": "2025-05-26T19:24:48Z",

I've been in touch with the team which will deliver the fix. The issue is under investigation. One corner case causing this has been already found.

syk · May 28, 2025, 5:53am

Currently we have three affected agents and thank you very much for your answer.

I will not touch the .fleet-agents index and wait patiently for a fixed version - after all the issue looks only cosmetical to me - functionality is not impacted.

GKre · June 15, 2025, 4:39am

is there any schedule for a fix?

syk · September 4, 2025, 12:08pm

Since I found no convenient or supported way to get elastic-agents out of “stuck” or “erroneously displayed” states in kibana→fleet→agents, I did it inconveniently and perhaps unsupportedly this way (thanks @lesio for pointing me this direction):

Disclaimer: don’t try this on your production ELK.. I guess..

Get yourself some privileges on an internal, hidden system-index:

Discover this index - e.g.:

Filter for specific Agents - e.g. via “local_metadata.host.hostname”:

Delete all ancient, antique, old or not-recent documents from the index (e.g. all docs except the last one in above screenshot..)
Fix the current document - painlessly (but highly discouraged..) until it looks equivalent to the agent’s real state (which you should check locally - we often see already upgraded agents on local systems that are displayed with a lower version in kibana - resisting every upgrade attempt via fleet..)

e.g.: clear the “audit_unenrolled_reason”:

POST .fleet-agents/_update_by_query
{
    "query": {
        "term": {
            "agent.id": "2x7x44f4-7954-4478-xxxx-2c07xx907f17"
                }
            },   
    "script": {
        "source": "ctx._source.audit_unenrolled_reason = null;",
        "lang": "painless"
              }
}

I cleared the “orphaned” string last - after resetting all possibly incorrect date-fields (using painless like above..)

btw: I hope we get a convenient and supported way to fix this when the future major-version comes along:

#! this request accesses system indices: [.fleet-agents-7], but in a future major version, direct access to system indices will be prevented by default

GKre · September 30, 2025, 11:06am

unfortunately this topic has been solved but the issue still exists?

I do not think that it is topic can be ignored only because of an existing workarround.

Is there any recent info from DEV-Team about how to solve this without having to edit system index info?

Charles_Nkuna · September 30, 2025, 11:33am

i agree also experiecing the same issue

cogz0qj · October 9, 2025, 7:52am

Same issue, 44 agents affected.

syk · October 10, 2025, 7:02am

By closing this topic as “solved” I didn’t mean it shouldn’t be addressed by Elastic.. contrary..

I could not find a way to re-open it, but want to add the following:

We found a way with “very high success rate” to fix Agents in Version 9.1.5 using only Kibana’s Fleet-Interface. I guess this is possible because the devs did already try to address issues leading to stuck and misbehaving agents:

(We experienced problems mostly with agents using elastic-endpoint/defend integration..)

What worked for us for 98% of agents we upgraded recently from 9.1.4 to 9.1.5 was:

assign upgrade-failed agents to a temporary policy without elastic-defend/endpoint integration (ours only had system..)
wait for agent to appear green again (somtimes with exclamation mark..)
re-assign to former policy
wait again
restart upgrade

(3 and 5 are interchangeable..?)

hope it helps..

Topic		Replies	Views
Elastic Endpoint Security with Elastic Agent Endpoint Security	16	3261	November 10, 2020
Elastic Agent not sending Data Elastic Security elastic-agent	19	13476	November 4, 2022
Elastic Agents disappearing Endpoint Security fleet	24	2830	September 16, 2021
Elastic Agent fault by Fleet manager Beats elastic-agent	1	905	June 4, 2021
Elastic Endpoint Security missing host Endpoint Security	21	3795	November 4, 2020

Orphaned agent is healthy in 9.0.1

Related topics