What are the specifications of your shard configuration or equipment?

I'm using Filebeat, Logstash and ElasticSearch to collect logs.

Today, I encountered the following error.

# tail /var/log/logstash/logstash-plain.log
[2021-06-16T10:10:48,509][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x2522ee6>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,509][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x56befee7>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,509][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x777ddf23>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,509][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x520ee1a1>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,510][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x2b27d278>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,510][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x902dda3>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,510][WARN ][logstash.outputs.elasticsearch][main][c24e7e41b335e6af9290ccdd216db539ac55eeb30dbb4362e3e58cc79b91102b] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-cas-server-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x4693426c>], :response=>{"index"=>{"_index"=>"madam-cas-server-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,510][WARN ][logstash.outputs.elasticsearch][main][3fa00590784976059791f8c2c84cc91e565e159f5722752cfffdc4a041e5b050] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"sro-api-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x6c18da95>], :response=>{"index"=>{"_index"=>"sro-api-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,512][WARN ][logstash.outputs.elasticsearch][main][f0a309d0296c0cfaa8ca0029d1526d531aee25749bcb74718363b7d158aa718e] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-api-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1439a930>], :response=>{"index"=>{"_index"=>"madam-api-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}
[2021-06-16T10:10:48,512][WARN ][logstash.outputs.elasticsearch][main][f0a309d0296c0cfaa8ca0029d1526d531aee25749bcb74718363b7d158aa718e] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"madam-api-2021.06.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x7e7dcd79>], :response=>{"index"=>{"_index"=>"madam-api-2021.06.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1076]/[1000] maximum shards open;"}}}}

This means that the shard cannot be indexed because the maximum number of shards exceeds the limit. I have taken steps to extend the maximum number from 1000 to 2000, referring to the following article.

However, I think this is a temporary measure.
Upon further investigation, I found the following article.

I intend to collect 20 different logs each day, keeping a date for each type, and keeping the logs for 60 days.
I am preparing a 4TB disk to hold the 20 indexes for 60 days. (We think we can fit in this)

At this point, what settings do I need to make to suppress the above error?
Or what are the specs of the equipment needed for this?

Are they all the same format? If not then grouping them into similar indices makes sense to limit mapping explosions like you are seeing.

As for your other questions, the current best approach is to use ILM to optimise sharding.

To be more precise, there are five different formats that are the same, and we collect logs from four different services.
One service has 2 to 10 servers outputting logs daily.
Is it unreasonable to do this with a single ElasticSearch?

We use curator for the index lifecycle, and we delete indexes older than 60 days.

Not sure I follow how they can be different and the same?

Do you mean that it would be easier to make suggestions if there was a specific log format?

I worked around the error tentatively by running the following command.

$ curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "2000" }

What I want to know is the upper limit (or upper bound?) of the number of shards.
If it is due to machine specs, etc., I would like to know the connection.
The reason I want to know this is so that I can consider what spec server will be able to hold up to the number of logs I want to store in ElasticSearch (i.e., the factor that increases the number of shards), and whether I need to expand the server.

Would it be better to be more specific?

There are a few questions.

  • If I don't create replicas and just create one primary shard, can I assume that the number of shards equals the number of indexes?

  • Is it the disk size that you are referring to here? Or is it the memory size?

    How many shards should I have in my Elasticsearch cluster? | Elastic Blog
    TIP: Small shards result in small segments, which increases overhead. Aim to keep the average shard size between at least a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.

  • Can we raise the upper limit on the number of shards if we have enough disk size?

  • If we keep the size per shard at 40GB, is it correct that we can set the limit based on the number of shards as long as the total shard size does not exceed the disk space?

  • Does it count the number of shards even when no replicas are created?

    • In other words, if you have 1 primary shard and 0 replicas, is the shard count 1 or 2?

If each index has one primary shard and there are no replicas configured that would be correct.

The blog post refers to Java heap size. Each shard has some over head in terms of heap usage and also contributes to the size of the cluster state as information about the shard(s) and mappinmgs need to be stored there. For large clusters the size of the cluster state can be a limiting factor even if heap usage looks OK as updates to the clkuster state may get slow and it changes can be slow to propagate to other nodes.

As described earlier the amount of data a node can handle is often limited by the heap size and not always disk space.

The blog post outlines a guideline around the limit of shards per GB heap but as memory usage per shard will depend on shard size as well as mappings there is no guarantee a node can handle a certain number of large shards at this limit.

Yes, primary and replica shards both count towards the limit.

That is a single shard.

I would provide the following recommendations:

  • If you are going to run a single node cluster (it sounds like this might be what you are planning) you will never have any replica shards as Elasticsearch will not allocate replicas to the same node where the primary resides. This will leave you without high availability which may be OK for your use case. In this case I think you can increase the limit to 2000 as long as you have enough heap to handle the data as 2000 indices in the cluster state should be fine. If you were planning a larger cluster with more data nodes I would generally avoid changing the limit.
  • In order to reduce heap usage and increase the amount of data your node can handle I would recommend forcemerging indices down to a single segment and make them read-only once they are no longer written to as this can reduce heap usage significantly. You should be able to automate this using ILM by specifying a hot and warm phase without relocation of shards.
1 Like

Thank you for your answer!!!
I'm slowly getting my head around this.

In particular, thank you for your recommendations.

First of all, as you have suggested, I will not be allocating replicas in a single node cluster.
The reason for this is that in my case, it is the server logs that I am collecting, and the logs themselves are stored on each server, so I don't think replicas are necessary on elasticsearch.
In order to keep the daily logs for 60 days, the other day the number of shards exceeded 1000 and logstash stopped.
As a first aid, I changed the upper limit of the number of shards to 2000, which turned out to be a mistake. I am relieved.

Your second suggestion is very interesting!!!
Could you please elaborate on how to do that?

I use elastic's curator to manage my ILM.
By indexing daily, it is very easy to configure curator to delete indexes older than 60 days.
However, the heap is always hovering near the memory limit (16GB) and we want to control this.

We can make it read-only, so it doesn't need to be written to except for the index that is collecting the logs today. (I would like to know how to do that).
Also, if I consolidate the indexes into one segment, is it easy to delete the indexes by date? (I would like to know how to do that if possible).

If you only have 16GB heap that will limit the amount of data you can hold on the node and you should not increase the shard limit in that case. I assumed you were using the maximum heap around 30GB.

In a single node cluster you can never allocate replica shards so that is not a concern.

Curator has even more options and features than ILM so you can do everything I described using Curator. I have not configured Curator in a while so will however not be able to help with that.

Forcemerging into a single segment just optimised the shards and this does not affect how you manage retention.

Hmmm... if I only have a 16GB heap, should I leave the maximum number of shards at 1000?
If so, does that mean I need to increase the memory to 30GB or have another server?

I see.
curator is a very nice tool. I'll look into it more.

So, I don't need to change the curator settings, right? I'm relieved.

Yes, I think that is what I would recommend.

You will need to add additional actions to perform the forcemerge and make the index read only.

I understand. I'll try to add more memory or server.
(Thank you very much. I've been wanting this advice for a long time!)

Thanks for the keywords.
Does the forcemerge you refer to refer to the one on the following page?

If so, I'll read it and add it to the curator's actions.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.