After gone offline for a day the elastic cluster started to be too slow, check of possible misconfigurations and could not fine anything to make it faster, the cluster is on green and there is not unassing shards, however its using less that 20% of the CPU and it is super slow.
I don't see it in your post, but what version of Elasticsearch are you running? 8.x receive a good number of improvements with many shards, so if you're running something like 7.x you might want to consider upgrading.
Also, general question, what makes you think the cluster is slow? Is there a specific use case you've seen issues with? Can you provide examples of the issue that "shows" slowness?
im using version 8.3.3 but im planing to update to the latest on the weekend, i say that is slow bc every time i try to do anything in kibana i takes more than normal, looking at alerts, searching for information or loading any dashboards, i think it is Elasticsearch bc is using so low CPU and that was not normal before
The way to calculate how many shards you should have in a node based on the heap size changed in 8.3, but based on your disk size you seem to have too many small shards.
Assuming that you have something close to 420 GB of data, with 994 shard this would give an average of 422 MB per shard. You should aim to have a shard size between 10 GB and 50 GB.
What is your use case? Do you have time based data?
Also, your CPU usage may be low, but your load is high for your specs, you showed a load of 22 for a 16 cpu node, this could mean that the disk in this node is having some issues with the amount of I/O.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.