I'm running ES 1.7. 2 node cluster on Centos 7.
Heap Size: 30GB out of 96GB total RAM.
533 total shards
1.7TB total index size
Fielddata size is set to 40% of heap, so 12GB
Issue: Very long old JVM GCs. When JVM heap usage goes above 90%, GC takes 30seconds +.
The only thing I've been able to do right now to remedy it is to run a clear cache. This is not a sustainable long run solution. The other thing I noticed is that total physical memory usage on the server never goes above 30-35%, which is basically heap usage.
-- I see Indices Lucene Memory using 12GB of heap space. What makes up this 12GB?
-- Why don't I see Lucene using any "off heap" memory?
Does Indices Lucene Memory translate to segments?
If so, why are segments so large?
You should reduce your shard count! This will help, ultimately that is the best solution.
Thanks for the reply Mark. Whats the best way to do this? I've got a little over 155 indices and they all have 2 -3 shards each. I've set the replica to 0 for now for all of them to help reduce # of shards, but whats the best long term solution? Do I just need to scale out to more servers?
And do you think that the # of shards is the reason why I get long GC times?
Give each new index fewer shards. With two nodes you should probably give each new index a single replica and one shard. You can't easily re-shard your existing indexes but I'd do so for new ones.
Thanks. I've also been closing indices that are older than x number of days. So would that help to reduce the memory pressure? Also, would running an optimize on the open indices help use less heap memory as well?
These are infrastructure logs so there is a new index per day.
Move to weekly indices, or reduce the shard count in the template you are using as well.
Each shard is equivalent to one Lucene index and in theory, it can hold around 2 billions. If the input rate is a few millions a day, I don't think you'll need more than 2 shards per index. Hope this will help you reducing the number of shards and maybe the number of indices by indexing data into a weekly index, not a daily index.
Also, if you have a spare machine with less memory (let's say 8 to 16GB RAM) I suggest to use this machine as a Master Node to relieve the burden on the master/data node combination and you can put Kibana on this machine too. If you have another spare machine, I suggest to use it as a Client Node to relieve the burden on the Master Node because it looks like you have a lot of indices in this two-node cluster.
Actually you are better off having all 3 data+master when you are this size.
I agree with you 100% @warkolm. I don't know about Melanie's case but I have a few cases where the customers can't afford to have 3 machines to start with.
Thanks for the reply guys.
I can likely get a 3rd server but its kinda hard to justify a 3rd server when these 2 boxes are pretty under utilized when it come to CPU and also when it comes to overall RAM usage. Besides java using all the heap allocated (ES is allocated 30GB heap. Total available RAM on the box is 96GB). I read that Lucene uses off-heap RAM and so having lots available as file cache for Lucene will greatly improve full text search. However, from what I can see, my overall RAM usage never goes much above what I've allocated to ES. I don't see Lucene using any off heap memory at all for full text search. Do you gusy know more about this and how to get Lucene to use more off heap memory?
Using doc values will reduce your overall heap usage.
Thanks we've converted our non-analyzed string values to doc values
I still don't understand this and hoping someone can help me understand it. I read that Lucene uses off-heap RAM and so having lots available as file cache for Lucene will greatly improve full text search.
I have a 96GB RAM machine. I've allocated 32GB to elasticsearch. The rest should theoretically be used by Lucene. However, my overall RAM usage never goes much above 32GB. I don't see Lucene using any off heap memory at all for full text search. Does anyone know why and how to get Lucene to use more off heap memory?
That's handled by the OS caching files it uses, What OS are you on.
Is there a setting I need to tune to allow Lucene to use the remaining RAM?