ELK cluster is not stable due to several issues related to shards and JVM configuration

First it is not the first time i am sharing the issues that i am encountering and hopefully this time somebody will help me with that !!!
So i have an ELK cluster that is running on premise on a rancher k8s based on 3 VMs with the following resources: 32 RAM and 8 CPU and unlimited storage ( volume mount).
The cluster is created to save automation python logs that are directed to Logstash APP already structured using logstash_async module. Every python script creating new unique index and its logs will create new indices with same index created which could be hundred of logs.
I deployed logstash ( 3 replicas),elasticsearch(3 replicas) and kibana(1 replica) using Helm Charts ( I can share them if needed).
I did not configure anything in any chart something that related to shards.
The application working fine for a period of time let's say one month and then everything starts to break mainly in the elasticsearch pods:

total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}}
> 7/28/2021 6:09:42 PM [2021-07-28T15:09:42,731][WARN ][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"qascreencapture-1627484927937893928", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x34f50dc7>], :response=>{"index"=>{"_index"=>"qascreencapture-1627484927937893928", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}}

I fixed that by the following :curl -XPUT -H 'Content-Type: application/json' 'IP-OF-ELASTIC-SERVER:9200/_cluster/settings' -d '{ "persistent" : {"cluster.max_shards_per_node" : 5000}}'

I am pretty sure that this is not the way to correct to do that and there some other solutions that i need here your suggestions. Then after that everything worked well till i got new errors related to JVM and HEAP but mainly related to the change i made with shards.

Please help me.


This seems to be a very inefficient and wasteful way to index data into Elasticsearch and will not scale well each shard has overhead and increases the size of the cluster state. Please read this blog post on sharding and find a way to not create a new index per script for low volumes of data. If you want to store data for a long time or expect large volumes you need to make sure that your shard sizes are at least a few tens of GB in size.

1 Like

Thanks :slight_smile:
I will change the way i am indexing my logs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.