Still getting 'max open shards' error after re-indexing and decreasing shards to 850

Hello,

I'm running ELK 7.4 and have been getting this error in Logstash logs for basically every event that's being processed:

[2020-07-15T15:24:06,789][WARN ][logstash.outputs.elasticsearch][enm] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ssoinstr-2020.29", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x111c340c>], :response=>{"index"=>{"_index"=>"ssoinstr-2020.29", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [999]/[1000] maximum shards open;"}}}}

Before, I had a total of ~5000 shards for my 7 node cluster (1 replica per index). Despite the error in the Logstash logs, I could still see the data being indexed via Kibana. But I wanted to get rid of the recurring log errors anyways so I followed some tips online to lower the number of shards

After re-indexing my data, the number of shards for my cluster decreased to ~850. I confirmed the number of shards using:

GET _cluster/stats?filter_path=indices.shards.total

I check the logs and was still getting the error. I restarted the Logstash instances but continued to get the same errors (errors were still saying I had 999 open shards).

As a last ditch effort, I increased the number of max open shards to 2000 using the API, but continued to get the error.

Does anyone have any experience with this? While the error doesn't actually seem to be affecting performance of the cluster, it's very annoying and clutters the log when I'm trying to troubleshoot.

Thank you!!

Is there similar logs in Elasticsearch?

No - I'm not getting any errors in the Elasticsearch logs

How did you lower the number of shards? Deletion? Reindexing? Merge indices?

I re-indexed a large number of smaller indices into a fewer number of larger indices, and then deleted the original smaller indices

There is also per node shard limit; default is 1K, which looks like the error you get 999/1000. Via cluster.max_shards_per_node - set that higher to see if your error goes away and look at shards per node which is hard to get - I think really only by getting a shard list GET /_cat/shards and counting by node name or ID.

Do you have any closed indices?

On my monitoring page it says I have 822 open shards total. To double check, I saved the output to GET /_cat/shards and wrote a script to count the total number of shards and how many were on each node. I got 822 total and 274 on each node since I have 3 data nodes.

So it seems like for whatever reason Logstash hasn't identified the change in number of open shards per node. It continues to tell me that it can't index the event because there are too many open shards, but the event gets indexed anyways. Color me confused.

No closed indices

Logstash doesn't care; the error is a rejection from Elasticsearch - with 822 then you are no exceeding 1K of course, so seems odd.

To 'add' shards on ssoinstr-2020.29 it must be creating that index I assume so this does seem strange; I guess I suggest raising the cluster.max_shards_per_node anyway as that defaults to 1K which is the number cited in your error to see if it goes away - your cluster is green, right?

Also very odd the data is indexed anyway; I guess it might try to create the shard on another node, but the error mentions the cluster which is also odd. You should confirm your cluster max - what "max open shards" setting are you setting? The only setting I know is per node but this makes no sense if your cluster limit it 1K on 7 nodes; someone would have had to set this to 1K/7; very low.

So verify max_shards_per_node hasn't been changed by someone to something lower.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.