currently i'm running ES 1.7.2 on 3 nodes (64 GB RAM, 30GB to ES and rest to cache, 48 cores and 10k rpm HDD disks).
I know that i currently don't have enough RAM for the use cases of my cluster - ingesting 3 TB of data daily via bulk API. the indexes are created daily and keep 7 days of data. the ES can keep up with the data ingest but the problems start when analysts query the data, and start filling the fielddata. during peek hours i see the JVM Mem usage surpass 85-90%, alot of GC going on, and in HQ i see very high numbers of field evictions and filter evictions and slow refresh and flush on the indexes, up to a point where the ES cluster becomes unresponsive and i need to restart the cluster the clear the cache.
aside from addition of SSD disks, i would like to add more nodes with additional RAM, but i want to have a solid number of the RAM requirements based on my use case. is there a way to calculate how much field data being used? how to calculate ES RAM usage on a period of time (lets say a week?). i know it cant be accurate because of different search requests can have different requirements but I want to have a rough estimate on home much more RAM/nodes should I add.
the only tip i found was here: http://evertrue.github.io/blog/2014/11/16/3-performance-tuning-tips-for-elasticsearch/
We take the sum of all data node JVM heap sizes
We allocate 75% of the heap to indices.fielddata.cache.size
As our data set grows, if the utilization starts approaching that 75% barrier, we will add additional data nodes to spread the data and therefore the cache utilization horizontally.
unfortunately i can't add nodes on the fly so i need to know my estimated usage.
any help will be greatly appreciated, thanks!!