We have a total of 55 indices with 228 shards and a disk space of 4.8 TB
Our indexing rate ranges from (3000-8000 docs per second) with a total of 250 million docs coming per day.
We have 10 data nodes running each with 2 cores and 5 GB of RAM (50% heap) capacity and we have a total of around 4.5 billion documents at the moment:
Here's out heap usage in the past 1 week (max heap is 25 GB):
When I run a query to get all the data (4.8 billion docs), the query passes sometimes and fails sometimes irrespective of the amount of indexing happening at that time. CPU utilization on all the nodes reach 100% when the query is running
No-one is running queries on the cluster except for me.
search thread pool queue count doesn't cross 60 when the query is running:
Even though the official docs insist on setting the heap to 50% of available RAM, it seems like we are not using most of the heap available, do you think the search would improve if I decrease the heap size to 1.5GB?
OR is there another way to improve search performance?
It fails in kibana with the following error message (not sure if this matters because sometimes the query passes):
SyntaxError: Unexpected token < in JSON at position 0
at JSON.parse (<anonymous>)
at u.$digest (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:100155)
at u.$apply (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:102334)
at w (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:68618)
at XMLHttpRequest.b.onload (https://elk.company.com/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:67981)
disk utilization is not at all reaching the peak, we use 10 EBS volumes each with 1000GB which means we have 3000IOPS limit and when I just ran the query, the read IOPS didn't even reach a sum of 100
I feel like this cannot be an IO issue, we just re-built the cluster and the nodes are able to read like 1GB per second some times, I mean to say that the nodes are not utilizing the full potential of the EBS volumes when I ran a query to get the data from the last 30 days