Kibana Fleet high CPU Load on Elasticsearch when adding Integrations

Since Elastic-Stack 8.8.0 we observe an issue that is reproducible when navigating fleet and especially modifying integrations where the Elastic-Stack Cluster stalls out due to high CPU. In the following screenshot I've tried to add the system integration and the Elasticsearch process goes to 100% CPU load for several minutes.

7 Elasticsearch instances
2 Kibana instances
2 Fleet instances

Per Server
AMD EPYC 7313
8TB NVME SSD
128 GB Memory
Debian 12

It seems as there is a bug in a component but i would need help to track it down.

It seems as the Warm and Cold nodes are hit by this issue. We have about 50 Elastic-Agents with about 20 Policies.

What I've found in the kibana log

2023-06-23T16:37:15.944604+02:00 XXX kibana[79289]: [2023-06-23T16:37:15.944+02:00][ERROR][plugins.taskManager] [WorkloadAggregator]: ResponseError: search_phase_execution_exception
2023-06-23T16:37:15.944672+02:00 XXX kibana[79289]: #011Root causes:
2023-06-23T16:37:15.944716+02:00 XXX kibana[79289]: #011#011parse_exception: operator not supported for date math [+12500ms]

Elasticsearch logfile doesn't seem to have anything interesting.

Furthermore, there seems to be an issue with the persistence of the integrations. It's not clear when this happens but the integrations can be installed and all the visualizations show correctly on the dashboards. Then after some time the dashboard stops working and the following errors show up:

After reinstalling the integration it seems to work again until it stops working. not sure when this happens. I currently suspect the following

  • installing another integration
  • restarting kibana process

Trying to force a reinstall with

POST kbn:/api/fleet/epm/packages/windows/1.24.0
{"force":true}

generates

{
  "statusCode": 500,
  "error": "Internal Server Error",
  "message": "Non-unique import objects detected: [dashboard:windows-c77e06c0-9e7c-11ea-af6f-cfdb1ee1d6c8,dashboard:windows-d9eba730-c991-11e7-9835-2f31fe08873b,visualization:windows-1eeaaf70-9f23-11ea-bef1-95118e62a7c1,visualization:windows-23a5fff0-c98e-11e7-9835-2f31fe08873b,visualization:windows-2dbabdf0-9f29-11ea-bef1-95118e62a7c1,visualization:windows-35f5ad60-c996-11e7-9835-2f31fe08873b,visualization:windows-3e55daa0-9e8e-11ea-af6f-cfdb1ee1d6c8,visualization:windows-52543ef0-9e95-11ea-af6f-cfdb1ee1d6c8,visualization:windows-70751050-9f33-11ea-bef1-95118e62a7c1,visualization:windows-78874900-9f30-11ea-bef1-95118e62a7c1,visualization:windows-7adbce50-9e96-11ea-af6f-cfdb1ee1d6c8,visualization:windows-7f3e7710-9e94-11ea-af6f-cfdb1ee1d6c8,visualization:windows-830c45f0-c991-11e7-9835-2f31fe08873b,visualization:windows-92a2a6b0-9f29-11ea-bef1-95118e62a7c1,visualization:windows-9ec52c30-9e91-11ea-af6f-cfdb1ee1d6c8,visualization:windows-b0c5d570-9e7c-11ea-af6f-cfdb1ee1d6c8,visualization:windows-c0945210-9e8b-11ea-af6f-cfdb1ee1d6c8,visualization:windows-c36b2ba0-ca29-11e7-9835-2f31fe08873b,visualization:windows-d27dea70-9f32-11ea-bef1-95118e62a7c1,visualization:windows-e20b3940-9e9a-11ea-af6f-cfdb1ee1d6c8,visualization:windows-e64ff750-9f28-11ea-bef1-95118e62a7c1,visualization:windows-eb8277d0-c98c-11e7-9835-2f31fe08873b,visualization:windows-f9fa55f0-9f34-11ea-bef1-95118e62a7c1,visualization:windows-fbb025e0-9e7c-11ea-af6f-cfdb1ee1d6c8,search:windows-11a61760-9f27-11ea-bef1-95118e62a7c1,search:windows-b6b7ccc0-c98d-11e7-9835-2f31fe08873b]"
}

I can confirm that after restarting kibana the data view relationship breaks and the integration dashboards stop working

Furthermore:

  • Integrations are randomly reinstalled in other Kibana Spaces. The random reinstall fixes the data view not found issue in the space where the integration was randomly installed but this is not a solution as there are situations where the dashboards should only belong to certain Kibana Spaces

  • When reinstalling the integration, it seems to work in the choosen space until kibana is restated then the dashbaords only work in another random space.

  • There were old objects from 2022 that were installed with past integrations (windows, system) but not cleaned up with newer version. It seems as these artefacts were not used anymore, however it also seems as they were not cleaned because they were not used by anything.

  • Complete removal of an integration and reinstall doesn't fix the issue

Integrations

  • Windows 1.24.0
  • System 1.34.0
  • Cisco ISE 1.9.0

Glad to help if any more information is required. PITA to work with the integrations when suddendly everything starts to break and customer are on the line.

An idea: Could there be an interferrence when a data view already exist with the same name as the integration would use?

Integration in Space "S"

Data View in Space "S"

Kibana Space "L" Saved Objects: Many with the same name

Furthermore, it seems as the memory usage starts to spike when working with Kibana specially when using Elastic Fleet and Integration Page. The client response time seems very high too. There are 3 Kibana nodes but none is under load. We've increased the Memory for Kibana from 4GB to 8 GB and see peaks around 5.5 GB of memory usage (single user using fleet/integration). As you can see in the graph, after the peaks there is no user activity anymore. Not sure if it is expected to use so much memory.

Upgrading to Elastic-Stack 8.8.2 resolved the high CPU issue

1 Like

we have the same problem. Duplicated Data view and Dashboard could not locate it. Problems started at Version 8.8.0

I've opened up a dedicated thread for this issue:
Elastic Integrations fail to install and lead to broken Dashboards when used with multiple Kibana Spaces - Elastic Stack / Kibana - Discuss the Elastic Stack

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.