we use the following configuration of ELK:
Version: 8.5.3
OS: RHEL 8
Number of master nodes: 1
Number of master and warm nodes: 2
Hot nodes: 2
Kibana instances: 2
Our issue is that the navigation in Kibana is quite slow. Timeouts are often reported by browser.
Moreover, by sending the query GET kbn:api/task_manager/_health for about 20-30 times, it comes that one of them is not getting answered within the timeout limit of 30s.
The Logs of Kibana repeatedly show the following error every 30s:
{"service":{"node":{"roles":["background_tasks","ui"]}},"ecs":{"version":"8.4.0"},"@timestamp":"2023-12-12T14:53:37.005+01:00","message":"Failed to poll for work: Error: work has timed out","log":{"level":"ERROR","logger":"plugins.taskManager"},"process":{"pid":1013143},"trace":{"id":"15b465f635d80a6082f6c3ba991f0510"},"transaction":{"id":"c85ee3a8920becae"}}
The Logs of Elasticsearch are clean. The cluster health is green:
The /status page of Kibana is yellow showing 99 services as degraded:
The health of task manager reports an error status and a quite high drift:
We noticed this issue after the number of shards in data nodes reached the limit of 1000 shards pro node. At first, we solved this issue by increasing the limit to 2000. In the meanwhile, we reduced the number of shards to less than 700 shards pro data node. However, the issue with Kibana persists.
How is the usage for the elasticsearch and kibana nodes in monitoring? That is the best place to start looking for hints about the performance degradation.
The monitoring page in Kibana cannot be viewed. It repeatedly loads itself and reach timeouts with the error message 'Request timeout: Check the Elasticsearch Monitoring cluster network connection or the load level of the nodes.'
Just a note, the status of Elasticsearch being "green" is a measurement of the allocation of shards, and it means all primary and replica shards are allocated. It is not a measure of querying performance. It sounds like your cluster is still overloaded.
The green state of the cluster has indeed nothing to do with the querying performance - sorry for the slip of the tongue.
However, please note the table with some performance measures below the sentence 'The cluster health is green'. The heap and cpu consumption does not seem high, does it?
Could you please suggest us steps to delve deeper into diagnostics and eventually prove the overload?
Hi, it looks like there is perhaps a high number of background tasks in Kibana to manage some internal state. From what I can tell, a large amount of work is going into managing "search sessions."
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.