Segments memory_in_bytes excessively large with allot of open indices

Hi,

Ever since we moved to keeping all indices open on our ES 1.7 ELK cluster we've had a large increase in GC runs.

We have 1000+ indices, mostly with 3 shards
16408 segments in total
Index buffer is configured to 20%
Field data cache fixed to 2GB (very questionable setting, we know)
Shards get optimized to 2 segments after 1 day
This is on 14 machines running 14 instances of ES with 30GB heap

It seems that most memory is consumed by memory related to segments.
Output from 1 node:

    "segments" : {
      "count" : 1011,
      "memory_in_bytes" : 20007750054,
      "index_writer_memory_in_bytes" : 10224032,
      "index_writer_max_memory_in_bytes" : 6685990083,
      "version_map_memory_in_bytes" : 222936,
      "fixed_bit_set_memory_in_bytes" : 0
    }

The mappings we use are fairly standard with a fixed number of fields. We assume disabling norms will help us a bit - is that worth considering? Anything else we could look at?

1 Like

Hi, I can guess that it can be terms_in_memory. If it so, it's a trouble because I haven't got any answer how to manage this situation.

In 2.x the segments memory shown in more details:

"segments": {
"count": 8337,
"memory": "7.9gb",
"memory_in_bytes": 8570235911,
"terms_memory": "7.6gb",
"terms_memory_in_bytes": 8190950715,
"stored_fields_memory": "360.9mb",
"stored_fields_memory_in_bytes": 378518192,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "0b",
"norms_memory_in_bytes": 0,
"doc_values_memory": "749kb",
"doc_values_memory_in_bytes": 767004,
"index_writer_memory": "84.9mb",
"index_writer_memory_in_bytes": 89033244,
"index_writer_max_memory": "2gb",
"index_writer_max_memory_in_bytes": 2210971648,
"version_map_memory": "54.5mb",
"version_map_memory_in_bytes": 57193272,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0
},

How many documents in your index and how many field is indexed?

Can't verify using the API 'cause we're using 1.7. Unless ofcourse the *.tim files on disk are fully loaded in the heap, then I could make a calculation.
Would you mind verifying that on your cluster (du *.tim == terms_memory_in_bytes)?

Fields: anywhere between 20 and 60.
Docs in index: anywhere between a few and 250 million

These stats are fairly broad due to us splitting various types of logging into separate indices (for various reasons).

From How to decrease terms_memory footprint

As an option I can open and close older indexes at query time (open -> search -> close), but IMHO it's time/resource consuming decision

Wouldn't be an option at all for us, considering some indices being 100+ GB.

I've calculated sum of *.tim files it's 51837 MB and terms_in_memory is 7812 MB. Terms in memory in 6,6 times less than total.

Wouldn't be an option at all for us, considering some indices being 100+ GB

Another option is to increase number of nodes or heap memory to the infinite numbers and infinity $ :slightly_smiling: It's just too costly.

The hard way is to find the place (es/lucene codec ?) and rewrite it for keeping less term in memory (which can also slows searches).

1 Like