Getting 429 ES Exception while performing search queries

Hello,
I am using Elasticsearch 5.6.7. I have one node in my cluster with 128 GB ram and 16 GB heap size.

I have around 14 indices with 5 primary and 1 replica shard each in this node.
Each index has a size of 40-50 GB. These indices are used to store data for a particular day and we have search queries(mainly term aggregations) which allow querying from the current date to 2 weeks back.
When querying for 2 weeks long data, we get this exception ---

    {

    "type":"es_rejected_execution_exception",

    "reason":"rejected execution of [org.elasticsearch.transport.TransportService$7@6f7799fa](mailto:org.elasticsearch.transport.TransportService$7@6f7799fa) on EsThreadPoolExecutor[ **search** , **queue capacity = 1000** ,

    [org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6c0d64d2[ **Running** ](mailto:org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6c0d64d2[Running) **, pool size = 25, active threads = 25, queued tasks = 3410, completed tasks = 1538807]]** "

    }

Que.1: Why are we getting so many tasks?
Que.2: Will Increasing queue size be helpful?
Que.3: What configuration changes may help if we can't reduce the number of requests to ES?

Thanks in advance!!

How many clients do you have concurrently querying the node? Are they sending a single query at a time? How long does these queries take to complete?

There are multiple queries simultaneously sent by multiple clients. When querying for less than 1 week data queries take about 60-80 seconds. For 2 weeks data, we hit this exception. So we don't know the response time.

What type of storage do you have? SSDs?

What does CPU usage and disk I/O and iowait look like?

We are using HDDs. CPU usage and disk I/O I can pass on when I am able to reproduce it in one or two days.

If you have slow storage and queries are piling on, queues tend to fill up. I would not be surprised if you saw dramatic improvement if you switched to SSDs as these tend to handle random disk I/O a lot better than HDDs.

I will try to explain the scenario, we are sending at most 10 parallel search queries(by diving the 2 weeks time in 10 buckets). Pasting the sample query we are using :

{"from": 0, "size": 0,"query":{"bool":{"must":[{"range": {"timestamp":
{"gte" : "2019-01-28 00:00","lte":"2019-01:30 00:00","format":"YYYY-MM-dd HH:mm", "time_zone": "+0530"}}} ,
{"bool":{"must": [{"bool":{"should": [{"match": {"group1":{"query":"search_str", "type": "phrase"}}},
{"match": {"group2":{"query":"search_str", "type": "phrase"}}}]}}]}}]}},
"aggregations": {"Timestamp":
{"terms": {"field": "timestamp","size": 2147483647,"order": { "_term": "desc"}},
"aggregations": {"dimension1": {"terms": {"field": "d1","size": 2147483647},
"aggregations": {"dimension2": {"terms": {"field": "d2","size": 2147483647},
"aggregations": {"dimension3": {"terms": {"field": "d3","size": 2147483647},
"aggregations": {"value": {"avg": {"field": "field_name"}}}}}}}}}}}}

Que.1 How does Elasticsearch divide these requests into tasks internally? I didn't find any resources regarding this.
Que.2 How can we optimize the query to get similar data in less time? Thinking of using scroll instead of aggregations..which is better?

I'm fairly new to ES so pardon me if I have asked any stupid questions.

Are you seeing any evidence in the logs of long or frequent GC? You are specifying very large size parameters for your terms aggregations, which can lead to a lot of unnecessary memory usage. I would recommend tuning this query to try and make it as efficient as possible as you run it frequently.

Will specifying very large size param affect the memory usage even if we are not getting a large no. of buckets?

Yes, I believe that is still the case. This blog post is quite old, but I believe it is still largely relevant.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.