Universal Profiling is not working after Upgrade 8.9.0 to 8.10.3

Nabeel_Ahmed_NAK · October 12, 2023, 2:35pm

Hi All

I have upgraded Elastic Search from 8.9.0 to 8.10.3. It was working fine previously.
After upgrade, Universal Profiling is not working.

I have also followed this guide Upgrade Universal Profiling | Elastic Observability [8.10] | Elastic
as well as following all the steps but the issue was not resolved.

Also used POST kbn:/internal/fleet/reset_preconfigured_agent_policies/policy-elastic-agent-on-cloud

Here are the details
Universal Profiling Collector Version is 8.10.3
Universal Profiling Symbolizer Version is 8.10.3
Universal Profiling Agent Version is 8.9.0

Error logs:

[elastic_agent][info] Component state changed pf-host-agent-default (STARTING->HEALTHY): Healthy: communicating with pid '2620939'

[elastic_agent][info] Unit state changed pf-host-agent-default-27d84666-aefb-4405-b5d3-56f9acaf9122 (STARTING->CONFIGURING): Set initial configuration

[elastic_agent][error] Component state changed pf-host-agent-default (HEALTHY->FAILED): Failed: pid '2619891' exited with code '1'

[elastic_agent][error] Unit state changed pf-host-agent-default-27d84666-aefb-4405-b5d3-56f9acaf9122 (CONFIGURING->FAILED): Failed: pid '2619891' exited with code '1'

[elastic_agent][error] Unit state changed pf-host-agent-default (STARTING->FAILED): Failed: pid '2619891' exited with code '1

Thanks

Nabeel_Ahmed_NAK · October 16, 2023, 6:20am

Anyone can help me ?

Francesco_Gualazzi · October 19, 2023, 11:07am

Hello Nabeel

i am Francesco from the Universal Profiling team.
Thanks for reporting this issue, can you confirm if only the Host-Agent integration is declared unhealthy in Fleet?

The upgrade from 8.9.0 is notoriously problematic, hence we wrote the section in the guide that you mentioned, but still there may some other edge cases that are not resolved by resetting the default agent policy.

Can you please share via a screenshot the status of the fleet agents?
Cheers

Francesco_Gualazzi · October 19, 2023, 11:11am

One thing that comes to mind is: if it is the host-agent integration that is failing, you may have to remove it and add it back again in order for it to work.
Did you try this already?

Nabeel_Ahmed_NAK · October 19, 2023, 11:28am

Hi @Francesco_Gualazzi

I have tried multiple times by removing and installing integration again and again it does not works for me.

thanks

Francesco_Gualazzi · October 19, 2023, 1:10pm

What is the Linux Kernel version that the ceb-staging host runs on?
I am asking because there is a limitation on a range of Kernels that the agent can't run with at the moment under the Elastic Agent integration

The details are listed in the "Add Data" page

Can you also provide logs from the host-agent? they will be helpful to understand what's happening.

Nabeel_Ahmed_NAK · October 19, 2023, 1:40pm

@Francesco_Gualazzi

In 8.9.0 the same host was running fine.
BTW host.os.kernel is 5.15.0-48-generic.

The logs for pf host agent are listed in my first message please refer to those. they are still the same.

Thanks

Francesco_Gualazzi · October 19, 2023, 2:21pm

In 8.9.0 the same host was running fine.
BTW host.os.kernel is 5.15.0-48-generic.

Thanks for confirming the Kernel is not affected by the bug that prevents host-agent to run.
We introduced this new kernel check in 8.10 to prevent kernel freezes due to a kernel bug.

The logs for pf host agent are listed in my first message please refer to those. they are still the same.

The messages you posted above are the Elastic Agent logs, you should instead find the host-agent logs (as in the integration component's logs) through a dedicated dataset.
You can refer to this doc to enable and view host-agent logs when running via Elastic Agent.

Let me know, Cheers

Nabeel_Ahmed_NAK · October 20, 2023, 6:24am

HI @Francesco_Gualazzi

As I told you the above logs are host-agent logs from the monitoring of agent.

Thanks

Francesco_Gualazzi · October 23, 2023, 4:52pm

Hi Nabeel

unfortunately those logs are not saying much about why the host-agent is failing.
I wonder if you could try running the host agent via another deployment mode that is not via Elastic Agent, just to confirm what could be the source of the issue, if it's a misconfiguration of the backend due to the upgrade or something else.

i am afraid that the event.dataset you are displaying is not the correct one, if should be named pf-host-agent.
If there are no logs from that dataset, it may mean the host-agent is not starting at all, and the explanation to why this may be happening is easier to be found when running outside of Elastic Agent.

Please let me know if you are able to run the agent on a different method.
Cheers

Nabeel_Ahmed_NAK · October 26, 2023, 8:00am

@Francesco_Gualazzi
Thanks

Unfortunately, I don't have access of that server.

Francesco_Gualazzi · October 26, 2023, 9:24am

You could try running the host-agent on an Ubuntu VM with the same OS version, using the instructions provided in the Add Data page (both with the Binary and via Elastic Agent).

This would help us understand what is the cause of the problem for the agent not running under Elastic Agent, if it's broken in the Fleet configuration or if it's the host-agent that is unable to operate entirely on that specific OS/Kernel.

system · November 23, 2023, 9:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.