@spinscale, thanks for responding. Please find answers to your queries below:
Do you have any long running scroll searches/snapshots going on, that require certain files to be held open?
-- We do have scroll queries in place (which will soon be replaced with paginated queries), but they are not long running. At one point in time we might see around 100 scroll queries happening. Also, we have increased the maximum open scroll contexts to 10000 .
You can also check using
lsof if there are files marked as
deleted but still held open by the Elasticsearch process. You can use the node stats to check for any open search contexts. And also check the indices part of that output.
--- Below is the output for the index stats (we have a 11 node cluster):
In the above result, the count is including all 5 shards
Have you tweaked any Elasticsearch configuration? GC collection or merging options?
-- We did not tweak any of the above configurations. Although we did reduce the node query cache from 10% of heap (16 GB heap) to 5 mb. This is because our search results constantly change as we do a radial search to get the list of people within a radius & the location of the people are changing constantly. Hence, we did not find the need to have a high query cache which had a very high miss rate.
Also, what version are you running on exactly? About how much data are we talking when you only consider the 70k documents?
-- We are running on ES 7.5 (recently moved to a this version, previously on 2.4). Data is not huge. Currently, we have 77,223 documents with a size of 1.74 GB
Adding to this, the size of documents was much lesser in ES 2.4 compared to ES 7.5. In ES 2.4, the total size for 75k docs was only ~50 MB, but, this is too high in the new version. Were there any changes with respect to this?
We have around 12 fields in the index - one of which is a geo_point location, one is a nested object with keyword fields & others are text fields.
The nested object looks something like this: