What does "Indices Lucene memory" mean? Using a large amount of heap space

Melanie_Zamora · February 24, 2016, 4:46pm

Hi,

I'm running ES 1.7. 2 node cluster on Centos 7.
Heap Size: 30GB out of 96GB total RAM.
533 total shards
155 indices
1.7TB total index size
Fielddata size is set to 40% of heap, so 12GB

Issue: Very long old JVM GCs. When JVM heap usage goes above 90%, GC takes 30seconds +.
The only thing I've been able to do right now to remedy it is to run a clear cache. This is not a sustainable long run solution. The other thing I noticed is that total physical memory usage on the server never goes above 30-35%, which is basically heap usage.
-- I see Indices Lucene Memory using 12GB of heap space. What makes up this 12GB?
-- Why don't I see Lucene using any "off heap" memory?

Melanie_Zamora · February 24, 2016, 4:59pm

Does Indices Lucene Memory translate to segments?
If so, why are segments so large?

        "segments": {
           "count": 5175,
           "memory": "12.6gb",
           "memory_in_bytes": 13579396246,
           "index_writer_memory": "12.6mb",
           "index_writer_memory_in_bytes": 13228468,
           "index_writer_max_memory": "3.1gb",
           "index_writer_max_memory_in_bytes": 3394999889,
           "version_map_memory": "2.5mb",
           "version_map_memory_in_bytes": 2725444,
           "fixed_bit_set": "0b",
           "fixed_bit_set_memory_in_bytes": 0
        },

warkolm · February 25, 2016, 5:46am

You should reduce your shard count! This will help, ultimately that is the best solution.

Melanie_Zamora · February 25, 2016, 3:41pm

Thanks for the reply Mark. Whats the best way to do this? I've got a little over 155 indices and they all have 2 -3 shards each. I've set the replica to 0 for now for all of them to help reduce # of shards, but whats the best long term solution? Do I just need to scale out to more servers?
And do you think that the # of shards is the reason why I get long GC times?

nik9000 · February 25, 2016, 3:55pm

Give each new index fewer shards. With two nodes you should probably give each new index a single replica and one shard. You can't easily re-shard your existing indexes but I'd do so for new ones.

Melanie_Zamora · February 25, 2016, 4:00pm

Thanks. I've also been closing indices that are older than x number of days. So would that help to reduce the memory pressure? Also, would running an optimize on the open indices help use less heap memory as well?
These are infrastructure logs so there is a new index per day.

warkolm · February 25, 2016, 6:37pm

Move to weekly indices, or reduce the shard count in the template you are using as well.

thn · February 25, 2016, 6:40pm

Each shard is equivalent to one Lucene index and in theory, it can hold around 2 billions. If the input rate is a few millions a day, I don't think you'll need more than 2 shards per index. Hope this will help you reducing the number of shards and maybe the number of indices by indexing data into a weekly index, not a daily index.

Also, if you have a spare machine with less memory (let's say 8 to 16GB RAM) I suggest to use this machine as a Master Node to relieve the burden on the master/data node combination and you can put Kibana on this machine too. If you have another spare machine, I suggest to use it as a Client Node to relieve the burden on the Master Node because it looks like you have a lot of indices in this two-node cluster.

warkolm · February 25, 2016, 7:19pm

Actually you are better off having all 3 data+master when you are this size.

thn · February 26, 2016, 11:21am

I agree with you 100% @warkolm. I don't know about Melanie's case but I have a few cases where the customers can't afford to have 3 machines to start with.

Melanie_Zamora · February 26, 2016, 3:37pm

Thanks for the reply guys.
I can likely get a 3rd server but its kinda hard to justify a 3rd server when these 2 boxes are pretty under utilized when it come to CPU and also when it comes to overall RAM usage. Besides java using all the heap allocated (ES is allocated 30GB heap. Total available RAM on the box is 96GB). I read that Lucene uses off-heap RAM and so having lots available as file cache for Lucene will greatly improve full text search. However, from what I can see, my overall RAM usage never goes much above what I've allocated to ES. I don't see Lucene using any off heap memory at all for full text search. Do you gusy know more about this and how to get Lucene to use more off heap memory?

warkolm · February 26, 2016, 6:44pm

Using doc values will reduce your overall heap usage.

Melanie_Zamora · February 29, 2016, 6:06pm

Thanks we've converted our non-analyzed string values to doc values

Melanie_Zamora · February 21, 2017, 5:51pm

I still don't understand this and hoping someone can help me understand it. I read that Lucene uses off-heap RAM and so having lots available as file cache for Lucene will greatly improve full text search.
I have a 96GB RAM machine. I've allocated 32GB to elasticsearch. The rest should theoretically be used by Lucene. However, my overall RAM usage never goes much above 32GB. I don't see Lucene using any off heap memory at all for full text search. Does anyone know why and how to get Lucene to use more off heap memory?

warkolm · February 21, 2017, 8:23pm

That's handled by the OS caching files it uses, What OS are you on.

Melanie_Zamora · February 21, 2017, 8:37pm

I'm running on CentOS 7.

Melanie_Zamora · February 21, 2017, 8:39pm

Is there a setting I need to tune to allow Lucene to use the remaining RAM?

Topic		Replies	Views
Large index size cause high Java heap occupation? Elasticsearch	13	4969	July 5, 2017
Memory usage per index Elasticsearch	9	10243	July 6, 2017
Frequent GC and OOM due to many fields Elasticsearch	9	1523	July 5, 2017
Large heap usage with each node Elasticsearch	15	3751	July 5, 2017
Memory utilization - predicting 'out of heap space' errors Elasticsearch	13	426	July 6, 2017

What does "Indices Lucene memory" mean? Using a large amount of heap space

Related topics