My Elasticsearch version 7.9.2. I'm using
Filebeat to ship logs directly to
Elasticsearch. All the services are running as docker container- Elasticsearch, Filebeat, Kibana
The issue is- some of the
Kibana indices have size upto 25 GB !. Also Elasticsearch container is consuming high CPU load constantly.
Can anyone help me how to reduce the index size? Is this the reason behind high CPU consumption by Elasticsearch?
25GB is not too big for even a single shard index. It depends on your CPU and total number of indices and shards, but I don't think it will be a big problem with usual multi-core CPUs.
Are you using Stack Monitoring?
I did not enable Stack monitoring but it is present and shows all the stats of indices and shards.
Does stack monitoring impact the Elasticsearch performance?
What is the specs of the Elasticsearch container? How many memory and cpu?
Also, the size of the indices reflects the number of documents, to reduce it you would need to see if you can drop some kind of message and store less documents.
Are you using dynamic mapping or you created a mapping for your index? Using dynamic mapping can use a lot of storage, so if you can create a mapping for your index you could salve some space.
No, but it may give a hint to the reason behind high CPU consumption by Elasticsearch.
Thanks for clearing my doubt- uptil now I was thinking that 25 gb size per index was the culprit behind high CPU consumption.
Yes, I'm using dynamic mapping because I'm monitoring and alerting based on the Elasticsearch logs. Let's say in future some incident occurs then in order to diagnose we don't know which filed will be important or not. That's the reason I'm skeptical regarding manual mapping. Please let me know if I'm missing something
We have a one-node cluster setup. The Elasticsearch heap size is 5gb. Maximum shards per node is 4000 and replica shards is set to 0. Regarding how many CPU- can you explain this.
Also can you please suggest how to reduce the CPU consumption? It'll be really helpful because it is impacting our production servers.
This document will help you. 4000 shards for 5gb heap is 40 times more than recommendation. That could be the reason for high cpu consumption.
First, by default of dynamic mapping, string fields are mapped as
keyword subfield. It consumes twice by a simple calculation.
In addition you can store all data without 'indices (general meaning)' to save index size, and reindex them from
_source field when you realize you need it. See
index mapping parameters.
Those 2 might not be enough for 40 times number of shards. IMHO, you may need reconsider about whether you can organize some small daily indices to weekly indices and how long you have to keep the indices active. Maybe some older indices could be taken snapshot and deleted. Snapshot consume no CPU power. It cannot be searched but can be restored when needed. ILM will help you to implement automatic processing over time.
If you have a one-node cluster with 5 GB of Heap you should try to keep the number of shards below 100.
There is a recommendation to have a maximum of 20 shards per GB of Heap, with 5 GB of Heap this will give you a maximum of 100 shards, with 4000 shards you are way above this recommendation, your cluster is oversharded and this can impact performance of the cluster and the node. I can not be sure, but this can be the cause of the High CPU.
You should find a way to reduce drastically the number of shards or add more nodes to your cluster.
Another recommendation is to keep the size of shards around 50 GB.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.