Shard limit hit

Well to get down to it we migraded servers this weekend. My script worked before but now it doesnt. I keep on getting an error that

"[{'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Trying to query 1536 shards, which is over the limit of 1000. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive. It is usually a better idea to have a smaller number of larger shards. Update [action.search.shard_count.limit] to a greater value if you really want to query that many shards at the same time.'}]"

I have been trying all day to get either action.search.shard_count.limit or max_concurrent_shard_requests working and cant quite figure it out. Here is a snip of my code...

{
"size": 0,
"query": {
"query_string": {
"query": fleCont
}
},
"aggs": {
"per_scott": {
"terms": {
"script": {
"lang": "painless",
"inline": "doc['src_ip'].value + ',' + doc['dst_ip'].value + ',' + doc['dst_port'].value + ',' + doc['proto'].value + ',' + doc['devicename'].value + ',' + doc['policy_id'].value"
},
"size": 10000,
}
}
}
}

Funny, I just hit this limit last week. It turned out I was querying way more indices than I needed. My query used a wildcard myindex-*. With this, I was querying months of data when I only needed the last thirty days. I changed my logic to account for the date. So now I'm searching myindex-2018.03.*,myindex-2018.02.*, and lightening the load on my es stack. This may not apply to your case, but I thought I'd share my anecdote just in case.

1 Like

That's definitely a good solution.

Hey thanks pixel but yeah, I need to search all of logstash. It is just strange why my admin would have reduced the allowed shard size

I agree it is but we have 200+ logstash logs to dig through. I think it would be more detrimental to the server to feed each logstash in, connect, search, disconnect etc than just upping the shard counts.

1000 is the default. Your admin either neglected to migrate this custom setting over, or more likely, your admin may have increased the number of shards per index on your new cluster.

Ideally, you will want to keep within the recommended settings and rewrite logic around the limitations. That said, I think I found a post with the answer you are looking for. Changing the limit is a cluster setting that will need to be configured by an administrator:

I am looking into how to rewrite my logic but I am lost. Do you have any suggestions?

First thing is why using so many shards?

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

I do not manage this server. I was told to "search Elasticsearch for X and get the results". So that is what I am doing. I dont mean to sound mean or apathetic but 1) This just started happening today (when Friday it wasnt -> yes we migrated over the weekend) and 2) I am at a disconnect ad being an admin on a elasticsearch database because I was literally thrown into this projected after I said I dont know it.

I will email the admin and see what they say but I feel like I need a little bit better ground to stand on first. I will watch the video and read some more but still, I am not sure I will have enough.

@dadoonet unfortunately I don't think OP is the admin, so tweaking his cluster may not be an option for him.

@jmgarcia how many indices are you hitting when you make your query? Are your indices historical as-in daily or monthly (i.e. myindex-2018.03)? Are you using wildcards in your index request (i.e. myindex-*)? If so, the workaround for this case would be something like this:

Establish the date range you need to query (i.e. last 24 months)
Divide this range into months (2018.03, 2018.02, 2018.01, 2017.12, etc)
Then join these months into acceptable chunks that fly under your limit:
query1: https://myeshost:9200/myindex-2018.01*,myindex-2017.12*
query2: https://myeshost:9200/myindex-2018.03*,myindex-2018.02*
....etc....
Finally, you would populate a master dict where you manually aggregate the multiple queries into a single search result.

You are correct @pixelrebel I am not an admin.

When I do a the cat command I get a total of 128 different logstash that I need to search through. They are moved to a different server every ~30 days and there is no way I will be able to access those servers.

This limit on the number of shards that can be queried was introduced in Elasticsearch 5.x, which might have been what you might have migrated to over the weekend.

As David stated, it does look like you have a lot of shards. If you can provide the output of the cluster stats API we can get a better idea about the state of the cluster.

Alright, after more digging and talking with the admin, they changed the requirements so now I am not hitting the shard limit. A couple are close (I think one returns 989) but for now the issue is solved.

Right now I need to scrub the cluser stats too much @Christian_Dahlqvist and I dont have enough time to do that and post the results in here. Tomorrow or the following day I will though. In the future I know I will be running into this problem again so all I need is a little more time to sort other stuff out then we will be golden.

Thank you to everybody who helped.

Adios.... for now.

The output of the cluster stats (not state) API typically does not contain anything particularly sensitive. What is it that needs to be scrubbed?

They dont want the server names or anything like it released, including the logstash specific names. I will do my best to get the results today.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.