Elasticsearch timeout for search query

Rakesh_B · April 28, 2020, 3:06am

Hi,

Elasticsearch version 7.6.1

We have a total of 55 indices with 228 shards and a disk space of 4.8 TB

Our indexing rate ranges from (3000-8000 docs per second) with a total of 250 million docs coming per day.
We have 10 data nodes running each with 2 cores and 5 GB of RAM (50% heap) capacity and we have a total of around 4.5 billion documents at the moment:

Here's out heap usage in the past 1 week (max heap is 25 GB):

When I run a query to get all the data (4.8 billion docs), the query passes sometimes and fails sometimes irrespective of the amount of indexing happening at that time.
CPU utilization on all the nodes reach 100% when the query is running

No-one is running queries on the cluster except for me.
search thread pool queue count doesn't cross 60 when the query is running:

Even though the official docs insist on setting the heap to 50% of available RAM, it seems like we are not using most of the heap available, do you think the search would improve if I decrease the heap size to 1.5GB?
OR is there another way to improve search performance?

Thank you

Christian_Dahlqvist · April 28, 2020, 5:15am

What type of query do you run to get all data? How does it fail? Have you looked at disk utilization and iowait while you are running the query?

Rakesh_B · April 28, 2020, 6:00am

Thank you for a quick reply

I run the query in kibana UI like this to get all the data from the last 25 days

Screen Shot 2020-04-27 at 10.38.47 PM2194×94 9.21 KB
It fails in kibana with the following error message (not sure if this matters because sometimes the query passes):

    SyntaxError: Unexpected token < in JSON at position 0
        at JSON.parse (<anonymous>)
        at https://elk.company.com/bundles/commons.bundle.js:3:3380253
        at https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:94842
        at https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:94980
        at u.$digest (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:100155)
        at u.$apply (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:102334)
        at https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:65714
        at w (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:68618)
        at XMLHttpRequest.b.onload (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:67981)

disk utilization is not at all reaching the peak, we use 10 EBS volumes each with 1000GB which means we have 3000IOPS limit and when I just ran the query, the read IOPS didn't even reach a sum of 100

Screen Shot 2020-04-27 at 10.46.30 PM919×610 96.3 KB
here's the IOwait graph for the underlying EC2 instances (m5.4xlarge - 16 cores with 64 GB of RAM) that host the Elasticsearch data nodes

Screen Shot 2020-04-27 at 10.53.41 PM1516×978 101 KB

Rakesh_B · April 30, 2020, 12:19am

any update here please?

Thank you

Rakesh_B · May 1, 2020, 12:47am

Also in most of our nodes JVM usage looks like "saw-tooth", AFAIK this is over-allocation of RAM for JVM...right?

Rakesh_B · May 2, 2020, 12:47am

I feel like this cannot be an IO issue, we just re-built the cluster and the nodes are able to read like 1GB per second some times, I mean to say that the nodes are not utilizing the full potential of the EBS volumes when I ran a query to get the data from the last 30 days

Christian_Dahlqvist · May 2, 2020, 7:23am

That saw tooth pattern is what you are looking for and very healthy. You look to have a reasonable heap size.

I am not sure about the Kibana error though.

Can you capture the query being sent and try running this from the dev console in Kibana? Enabling the slow log and looking at what this shows might also be a good idea.

system · May 30, 2020, 7:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I think my heap is sized right but searches are still super slow Elasticsearch	9	1581	May 29, 2017
Elasticsearch high query/fetch time Elasticsearch	5	3221	August 18, 2017
Our elastic search query performance is VERY low Elasticsearch	12	1633	May 11, 2017
One search query puts whole cluster on knees Elasticsearch	16	1511	July 6, 2017
Search performance Elasticsearch	5	334	July 6, 2017

Elasticsearch timeout for search query

Related topics