Frequent GC and OOM due to many fields

muralikpbhat · October 25, 2016, 4:16am

We have an elastic search installation on a tiny box with 256MB heap space. There are many shards (400+) in a single node and total of 8k+ fields. We see GC running every minute and frequent OutofMemoryErrors. This may be expected with such a low memory and high number of fields, but wanted to know whether there are knobs to control memory usage.

Heap dump analysis shows that most of it is Byte array

Class Count
org.apache.lucene.util.BytesRef 61009
org.apache.lucene.util.fst.FST 13753

Mainly Referenced by:
Class Count
org.apache.lucene.codecs.blocktree.FieldReader 41259
org.apache.lucene.util.fst.FST 13753

Is there a way to control these data structures in lucene/es?

mainec · October 25, 2016, 10:46am

Can you explain your use case? Having so many shards, and so many fields on one single node with such low memory seems odd.

For general hints on sizing see https://www.elastic.co/blog/found-sizing-elasticsearch and https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html

Hope this helps,
Isabel

muralikpbhat · October 26, 2016, 3:11am

Thanks for the response.
You can think about that as time series indices, one index per day. We wanted to see how much we can push it before we start deleting the old indices. Please note that performance is not a consideration. So, i tried changing term_index_divisor to see whether we could reduce what is loaded into memory, but that support is removed: https://github.com/elastic/elasticsearch/pull/4379/commits/6c189310b9b299defc0746576c7d91d4c5c3d576

Any other way tune the memory usage in es or lucene would be really helpful.

jprante · October 26, 2016, 8:15am

It's not a question of tuning. You squeeze 400+ shards into a single node with small heap space. Just use 1 shard and you will be happy.

Christian_Dahlqvist · October 26, 2016, 8:22am

As already suggested you have far too many shards, which results in a fair amount of overhead. Time-based indices are useful for managing retention, but you probably want to reduce the number of shards per index to 1 and also switch to monthly or weekly rather than daily indices.

muralikpbhat · October 26, 2016, 8:59am

Thanks for the suggestions. Given the heap dump, won't this still happen with one shard with too many fields or too many terms for a field?. This seems like some kind of data structure lucene keeps in memory for term dictionary which is growing as terms or fields grow. More shards and indices is probably making it worse.

So I am looking for options to optimize what goes into memory, for example can we increase the terms per block in lucene posting?

Christian_Dahlqvist · October 26, 2016, 9:02am

Which version of Elasticsearch are you using?

muralikpbhat · October 26, 2016, 9:28am

2.3.2

Christian_Dahlqvist · October 26, 2016, 9:35am

Since you are on Elasticsearch 2.x, doc_values will be enabled by default, which reduces heap pressure. Reducing the number of shards and ensuring the average shard size is in the GB range is therefore the way to go. Please not that a heap size of at least 1 or 2GB is recommended for any kind of production system.

Topic		Replies	Views
Memory usage per index Elasticsearch	9	10151	July 6, 2017
Heap Usage is not as usual Elasticsearch	6	786	July 3, 2017
Heap settings for 128GB (RAM) server Elasticsearch	6	4374	May 3, 2018
Reduce lucene segment heap memory signature Elasticsearch	5	950	January 5, 2017
Elasticsearch gc overhead Elasticsearch	1	1262	March 23, 2020

Frequent GC and OOM due to many fields

Related topics