Elasticsearch timeout for search query


Elasticsearch version 7.6.1

We have a total of 55 indices with 228 shards and a disk space of 4.8 TB

Our indexing rate ranges from (3000-8000 docs per second) with a total of 250 million docs coming per day.
We have 10 data nodes running each with 2 cores and 5 GB of RAM (50% heap) capacity and we have a total of around 4.5 billion documents at the moment:

Here's out heap usage in the past 1 week (max heap is 25 GB):

When I run a query to get all the data (4.8 billion docs), the query passes sometimes and fails sometimes irrespective of the amount of indexing happening at that time.
CPU utilization on all the nodes reach 100% when the query is running

No-one is running queries on the cluster except for me.
search thread pool queue count doesn't cross 60 when the query is running:

Even though the official docs insist on setting the heap to 50% of available RAM, it seems like we are not using most of the heap available, do you think the search would improve if I decrease the heap size to 1.5GB?
OR is there another way to improve search performance?

  • Thank you

What type of query do you run to get all data? How does it fail? Have you looked at disk utilization and iowait while you are running the query?

Thank you for a quick reply

    SyntaxError: Unexpected token < in JSON at position 0
        at JSON.parse (<anonymous>)
        at https://elk.company.com/bundles/commons.bundle.js:3:3380253
        at https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:94842
        at https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:94980
        at u.$digest (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:100155)
        at u.$apply (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:102334)
        at https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:65714
        at w (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:68618)
        at XMLHttpRequest.b.onload (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:67981)

any update here please?

  • Thank you

Also in most of our nodes JVM usage looks like "saw-tooth", AFAIK this is over-allocation of RAM for JVM...right?

I feel like this cannot be an IO issue, we just re-built the cluster and the nodes are able to read like 1GB per second some times, I mean to say that the nodes are not utilizing the full potential of the EBS volumes when I ran a query to get the data from the last 30 days

That saw tooth pattern is what you are looking for and very healthy. You look to have a reasonable heap size.

I am not sure about the Kibana error though.

Can you capture the query being sent and try running this from the dev console in Kibana? Enabling the slow log and looking at what this shows might also be a good idea.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.