We have a 3 master and 6 Data Node ES setup, where data node is of config 30 GB RAM and 8 Core and where 16GB RAM is allocated for heap and rest for file system cache (i.e Lucene index caching)
We use ES mostly as a time series database and we are not using druid because our use case involve updating the dimensional data i.e updating documents whenever there is change in the product data
Our Index is de-normalized as per the ES best practices where the same document will contain both product info and sale or other facts and we create one index for each month where each index has 2 replicas and 6 shards each and each monthly index is of 30GB in size and holds 10million documents each and we store 3 years worth of data like this and at service layer we have optimizations like , if someone is querying for 3 months worth of data, then the query which we fire to ES refer to only those 3 monthly indices
swap is set to off on all the Nodes, open files limit has been set to 131070 on all nodes and we run force merge on all newly created indices to reduce the number of segments to 6 on a week basis.
And now coming to the main problem, if you see the image attached, JVM memory usage on all ES Nodes is always 40% plus and the attached screenshot is taken when there aren't any loads or queries running on top of ES, not sure how to debug this problem ? So any pointers on how can I drill down to the root cause of the issue ?
One other important point is, we run a lot of scripted queries on ES and this is how those scripted queries looks like https://www.dropbox.com/s/qlfem4qs04zchpg/scripted_queries_es.json?dl=0
I can't make these inline scripts parameter driven because their definition isn't constant and will change depending on the filters being passed in the front end by the user, some requests may have 1 filter, some 2 and some "N" number of filters
Does these kind of scripted queries will have any negative impact on system performance ?