We have a 3 master and 6 Data Node ES setup, where data node is of config 30 GB RAM and 8 Core and where 16GB RAM is allocated for heap and rest for file system cache (i.e Lucene index caching)
We use ES mostly as a time series database and we are not using druid because our use case involve updating the dimensional data i.e updating documents whenever there is change in the product data
Our Index is de-normalized as per the ES best practices where the same document will contain both product info and sale or other facts and we create one index for each month where each index has 2 replicas and 6 shards each and each monthly index is of 30GB in size and holds 10million documents each and we store 3 years worth of data like this and at service layer we have optimizations like , if someone is querying for 3 months worth of data, then the query which we fire to ES refer to only those 3 monthly indices
swap is set to off on all the Nodes, open files limit has been set to 131070 on all nodes and we run force merge on all newly created indices to reduce the number of segments to 6 on a week basis.
And now coming to the main problem, if you see the image attached, JVM memory usage on all ES Nodes is always 40% plus and the attached screenshot is taken when there aren't any loads or queries running on top of ES, not sure how to debug this problem ? So any pointers on how can I drill down to the root cause of the issue ?
I can't make these inline scripts parameter driven because their definition isn't constant and will change depending on the filters being passed in the front end by the user, some requests may have 1 filter, some 2 and some "N" number of filters
Does these kind of scripted queries will have any negative impact on system performance ?
Since 10G of your 16G heap is allocated to objects that are long lived (i.e. aren't going to get garbage collected), it makes sense that you don't see the JVM heap percentage drop below 60% even when the node is idle.
The next question is what is in the old gen. If you can't identify any obvious areas in your application (e.g., a plugin that creates an on-heap cache), then you could dump the JVM heap and inspect it manually (jmap/jconsole/jvisualvm) to get a better sense of the allocations.
I assume you are using the default jvm.options file with the only change being ms/mx=16GB?
Oh, I was planning to add more indices and BTW is there any relationship between number of Nodes & shards, because for each index, I have configured 6 shards & 2 replicas as I have 6 machines in place thinking that all machines will participate in query execution
Also on the heap size, I have gone through this below blog earlier, where they were saying for each GB heap allocated, we can have 20 shards i.e for 16GB, we can have around 320 shards per Node
"TIP: The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 to 25 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600-750 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health."
I have cut down the number of shards by 50% by dropping all the UN-used indexes , but still the heap is at 40% even if nothing is running. Is it a common thing ?
I think so. As that article says, "the further below this limit you can keep it the better". I would shoot for fewer shards per node and see what effect it has on your heap usage. Use 1 shard for each index unless the index won't fit on a single node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.