Kibana Fleet high CPU Load on Elasticsearch when adding Integrations

matled · June 22, 2023, 11:56pm

Since Elastic-Stack 8.8.0 we observe an issue that is reproducible when navigating fleet and especially modifying integrations where the Elastic-Stack Cluster stalls out due to high CPU. In the following screenshot I've tried to add the system integration and the Elasticsearch process goes to 100% CPU load for several minutes.

7 Elasticsearch instances
2 Kibana instances
2 Fleet instances

Per Server
AMD EPYC 7313
8TB NVME SSD
128 GB Memory
Debian 12

It seems as there is a bug in a component but i would need help to track it down.

matled · June 23, 2023, 2:57pm

It seems as the Warm and Cold nodes are hit by this issue. We have about 50 Elastic-Agents with about 20 Policies.

What I've found in the kibana log

2023-06-23T16:37:15.944604+02:00 XXX kibana[79289]: [2023-06-23T16:37:15.944+02:00][ERROR][plugins.taskManager] [WorkloadAggregator]: ResponseError: search_phase_execution_exception
2023-06-23T16:37:15.944672+02:00 XXX kibana[79289]: #011Root causes:
2023-06-23T16:37:15.944716+02:00 XXX kibana[79289]: #011#011parse_exception: operator not supported for date math [+12500ms]

Elasticsearch logfile doesn't seem to have anything interesting.

matled · June 23, 2023, 11:03pm

Furthermore, there seems to be an issue with the persistence of the integrations. It's not clear when this happens but the integrations can be installed and all the visualizations show correctly on the dashboards. Then after some time the dashboard stops working and the following errors show up:

After reinstalling the integration it seems to work again until it stops working. not sure when this happens. I currently suspect the following

installing another integration
restarting kibana process

matled · June 24, 2023, 12:34am

Trying to force a reinstall with

POST kbn:/api/fleet/epm/packages/windows/1.24.0
{"force":true}

generates

{
  "statusCode": 500,
  "error": "Internal Server Error",
  "message": "Non-unique import objects detected: [dashboard:windows-c77e06c0-9e7c-11ea-af6f-cfdb1ee1d6c8,dashboard:windows-d9eba730-c991-11e7-9835-2f31fe08873b,visualization:windows-1eeaaf70-9f23-11ea-bef1-95118e62a7c1,visualization:windows-23a5fff0-c98e-11e7-9835-2f31fe08873b,visualization:windows-2dbabdf0-9f29-11ea-bef1-95118e62a7c1,visualization:windows-35f5ad60-c996-11e7-9835-2f31fe08873b,visualization:windows-3e55daa0-9e8e-11ea-af6f-cfdb1ee1d6c8,visualization:windows-52543ef0-9e95-11ea-af6f-cfdb1ee1d6c8,visualization:windows-70751050-9f33-11ea-bef1-95118e62a7c1,visualization:windows-78874900-9f30-11ea-bef1-95118e62a7c1,visualization:windows-7adbce50-9e96-11ea-af6f-cfdb1ee1d6c8,visualization:windows-7f3e7710-9e94-11ea-af6f-cfdb1ee1d6c8,visualization:windows-830c45f0-c991-11e7-9835-2f31fe08873b,visualization:windows-92a2a6b0-9f29-11ea-bef1-95118e62a7c1,visualization:windows-9ec52c30-9e91-11ea-af6f-cfdb1ee1d6c8,visualization:windows-b0c5d570-9e7c-11ea-af6f-cfdb1ee1d6c8,visualization:windows-c0945210-9e8b-11ea-af6f-cfdb1ee1d6c8,visualization:windows-c36b2ba0-ca29-11e7-9835-2f31fe08873b,visualization:windows-d27dea70-9f32-11ea-bef1-95118e62a7c1,visualization:windows-e20b3940-9e9a-11ea-af6f-cfdb1ee1d6c8,visualization:windows-e64ff750-9f28-11ea-bef1-95118e62a7c1,visualization:windows-eb8277d0-c98c-11e7-9835-2f31fe08873b,visualization:windows-f9fa55f0-9f34-11ea-bef1-95118e62a7c1,visualization:windows-fbb025e0-9e7c-11ea-af6f-cfdb1ee1d6c8,search:windows-11a61760-9f27-11ea-bef1-95118e62a7c1,search:windows-b6b7ccc0-c98d-11e7-9835-2f31fe08873b]"
}

I can confirm that after restarting kibana the data view relationship breaks and the integration dashboards stop working

matled · June 24, 2023, 12:53am

Furthermore:

Integrations are randomly reinstalled in other Kibana Spaces. The random reinstall fixes the data view not found issue in the space where the integration was randomly installed but this is not a solution as there are situations where the dashboards should only belong to certain Kibana Spaces
When reinstalling the integration, it seems to work in the choosen space until kibana is restated then the dashbaords only work in another random space.
There were old objects from 2022 that were installed with past integrations (windows, system) but not cleaned up with newer version. It seems as these artefacts were not used anymore, however it also seems as they were not cleaned because they were not used by anything.
Complete removal of an integration and reinstall doesn't fix the issue

Integrations

Windows 1.24.0
System 1.34.0
Cisco ISE 1.9.0

Glad to help if any more information is required. PITA to work with the integrations when suddendly everything starts to break and customer are on the line.

matled · June 24, 2023, 11:10pm

An idea: Could there be an interferrence when a data view already exist with the same name as the integration would use?

Integration in Space "S"

Data View in Space "S"

Kibana Space "L" Saved Objects: Many with the same name

matled · June 26, 2023, 5:33pm

Furthermore, it seems as the memory usage starts to spike when working with Kibana specially when using Elastic Fleet and Integration Page. The client response time seems very high too. There are 3 Kibana nodes but none is under load. We've increased the Memory for Kibana from 4GB to 8 GB and see peaks around 5.5 GB of memory usage (single user using fleet/integration). As you can see in the graph, after the peaks there is no user activity anymore. Not sure if it is expected to use so much memory.

matled · June 29, 2023, 6:41pm

Upgrading to Elastic-Stack 8.8.2 resolved the high CPU issue

xn1ght · June 30, 2023, 11:23am

we have the same problem. Duplicated Data view and Dashboard could not locate it. Problems started at Version 8.8.0

matled · June 30, 2023, 11:27am

I've opened up a dedicated thread for this issue:
Elastic Integrations fail to install and lead to broken Dashboards when used with multiple Kibana Spaces - Elastic Stack / Kibana - Discuss the Elastic Stack

xn1ght · June 30, 2023, 11:29am

Thank you

system · July 28, 2023, 11:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana doesn't load integrations, and gets stuck loading them Kibana docker	7	192	November 19, 2024
Fleet Dashboard Not Loading or Taking Too Long to Load Elastic Agent elastic-stack-monitoring , fleet	26	163	November 11, 2024
High CPU usage ElasticSearch Causing Search slow and timeout with Kibana Kibana	9	2699	October 28, 2021
Fleet slow opening issue Elastic Observability	3	26	August 7, 2024
High CPU usage Elasticsearch	2	1568	July 5, 2017

Kibana Fleet high CPU Load on Elasticsearch when adding Integrations

Related topics