Multiple fleet entries cleanup

Hello,

in my fleet management there's 3 entries for the agent on one machine. Two of them are obsolete.
I wonder what is happening if i uninstall with the one fleet server i need for management?

Is it somehow possible to remove this entries without destroying the working fleet server?

Thx G.

Hi,

You can safely unenroll the agents you don't need as described here: Unenroll Elastic Agents | Fleet and Elastic Agent Guide [8.9] | Elastic

Hello Julia,

unfortunately this did not work.

There had been 3 entries in the list on the same machine.

After i removed the 2 obsolete ones - they had no activity - the remaining one came into a strange situation.

If i ask for the agent state i get this:

State: HEALTHY
Message: Running
Fleet State: STARTING
Fleet Message: (no message)
Components:

  • fleet-server (HEALTHY)
    Healthy: communicating with pid '1299'
  • udp (HEALTHY)
    Healthy: communicating with pid '1306'
  • filestream (HEALTHY)
    Healthy: communicating with pid '1313'
  • beat/metrics (HEALTHY)
    Healthy: communicating with pid '1314'
  • http/metrics (HEALTHY)
    Healthy: communicating with pid '1329'

In GUI the fleet server is not available.

Well - this is a test environment and i will no "go hunting" for a solution...

Can you check in agent logs if you see any error messages? You can request the diagnostics bundle with this command: elastic-agent diagnostics collect

it's currently running. I am getting some warning but this seems to be a known issue:

[WARNING] Could not redact state.yaml due to unmarshalling error: yaml: invalid map key: map[interface {}]interface {}{"unitid":"fleet-server-default-fleet-server-fleet_server-bad6ee92-babd-4f47-a612-eb78cb0f27ea", "unittype":0}

I wonder if it really takes so much time but i will be patient ...

Yeah that warning is not relevant, anything else in the logs that explains why the agent is stuck in starting state?

well - the command is still running? guess there's a problem and the diag will not be collected ...
I can check in the agent log directly...

i am a bit confused now. After hours, the fleet error went away. The agent still did not communicate. I guess i will need to investigate further and check the log in detail.
As far as it kooks to me now - deleting the obsolete agent entries in the fleet management somehow had negative impact on the agent running on the server.

Hello Julia,

after some testing i found out, that this might be a kind of "reboot" and "timing" issue.
I made an upgrade of the operating system and rebooted the machine.
After some while the fleet server was still not available. All i needed to do was stoping and starting the "elastic-agent" and the fleet Server came back.

This seems to be reproducable on my test machine.

So i was on the wrong path - the main issue is the agent and not the "remove obsolete ....". Sorry for this.

Next issue i will try to solve is an agent that is up to date but still showing "updating" in the fleet. For this there's some info available and i will check if it works for me also.

Have a good time and thank you for supporting me.

Which version of agent are you using? The issue with agent not starting up after reboot sounds similar to this that we fixed in 8.9: [SLES15]: Fleet-server Agent gets into offline state on machine reboot. · Issue #2431 · elastic/fleet-server · GitHub
"updating" status can happen if the agent hasn't checked in yet, or an upgrade has started that is in progress.
We have this support article about agents stuck in updating: Elastic Support Hub

i am at 8.8.2 and now i am updating to 8.9.1. Thx for the hint...
Regarding the "stuck updating" agent - i wil manually upgrade to 8.9.1 to see if the problem i going away.

funny - agent updated to 8.9.1 an the version changed but the state still is wrong. Guess i will read your support documents now...

so - i found my own solution as the supposed ones seemed to be "not easy" Finding out the "superuser" credentials nand handling the commandy - well, cost a lot of time.

My way was to unenroll the agent, remove it form the system manually and re-install it.

Worked fine :wink:

Now there's a new question but i will put it into a new thread.

Thanks for any help provided. i learned a lot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.