Segments memory_in_bytes goes down after reduction of segments + restart

Hi,

Related to Segments memory_in_bytes excessively large with allot of open indices

We reduced the amount of segments in our cluster but saw zero effect on segments memory_in_bytes. As a last resort we did a rolling restart and behold, a significant drop in segments memory usage and GC in OldGen went back to healthy zigzagging.

This is on ES 1.7.3.

Is this behaviour in any way expected?

Hi, in connection with our previous talk if this issue related with terms_in_memory, it's depends on number of term indexed and have no relation with number of segments in index.

I'm still waiting for developers / moderators which can help and bring some insights on this two topics.

I have no means of proving it's due to terms mem, so for now I'm assuming some kind of segments overhead.

Could you try 2.x with small subset of data? It provides much more detailed info about segment memory distribution ( https://github.com/elastic/elasticsearch/issues/11495 ), so you can see what's inside. But don't forget to disable doc_values if you don't use it in 1.7.

Results from testing a single index (we have multiple datasets hosted in a single cluster with wildly different characteristics) on ES2.2.

After import of small dataset:

"segments" : { "count" : 96, "memory_in_bytes" : 9899804, "terms_memory_in_bytes" : 7553284, "stored_fields_memory_in_bytes" : 2129024, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 30912, "doc_values_memory_in_bytes" : 186584, "index_writer_memory_in_bytes" : 1330056, "index_writer_max_memory_in_bytes" : 426010210, "version_map_memory_in_bytes" : 45314, "fixed_bit_set_memory_in_bytes" : 0 },

After reduction of segments by calling _optimize?max_segments=1:
"segments" : { "count" : 5, "memory_in_bytes" : 7408450, "terms_memory_in_bytes" : 5273278, "stored_fields_memory_in_bytes" : 2117808, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 1856, "doc_values_memory_in_bytes" : 15508, "index_writer_memory_in_bytes" : 0, "index_writer_max_memory_in_bytes" : 2560000, "version_map_memory_in_bytes" : 0, "fixed_bit_set_memory_in_bytes" : 0 },
Then a node restart:
"segments" : { "count" : 5, "memory_in_bytes" : 7408450, "terms_memory_in_bytes" : 5273278, "stored_fields_memory_in_bytes" : 2117808, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 1856, "doc_values_memory_in_bytes" : 15508, "index_writer_memory_in_bytes" : 0, "index_writer_max_memory_in_bytes" : 426010210, "version_map_memory_in_bytes" : 0, "fixed_bit_set_memory_in_bytes" : 0 },

In short, less segments due to merging results in less terms_memory_in_bytes which is somewhat surprising.

Performed the same tests in ES1.7.3. Instead of paying attention to terms_memory_in_bytes I instead checked segments.memory_in_bytes.

Basically behaviour is exactly the same as ES2.2. In other words I was not able to reproduce our production issue.

Hmm, interesting results. In your case you have norms and doc_values which decreased greatly. But they are so small (1-2%) in comparison with term_in_memory and stored fields memory. (you can see that memory_in_bytes=sum of other elements except of index writer)

Share some info from my cluster (ES 2.1.0):

"primaries": {
      "docs": {
        "count": 15471184885,
        "deleted": 0
      },
      "store": {
        "size_in_bytes": 1365688947337,
      }
      ...
      "segments": {
        "count": 864,
        "memory_in_bytes": 3622979823,
        "terms_memory_in_bytes": 3390816879,
        "stored_fields_memory_in_bytes": 232083456,
        "term_vectors_memory_in_bytes": 0,
        "norms_memory_in_bytes": 0,
        "doc_values_memory_in_bytes": 79488,
        ...
        "version_map_memory_in_bytes": 0,
        "fixed_bit_set_memory_in_bytes": 0
      },

And another index same type:

"primaries": {
      "docs": {
        "count": 14585838030,
        "deleted": 0
      },
      "store": {
        "size_in_bytes": 1310676837479,
      },
      ...
      "segments": {
        "count": 1039,
        "memory_in_bytes": 3463054649,
        "terms_memory_in_bytes": 3245432949,
        "stored_fields_memory_in_bytes": 217526112,
        "term_vectors_memory_in_bytes": 0,
        "norms_memory_in_bytes": 0,
        "doc_values_memory_in_bytes": 95588,
        ...
        "version_map_memory_in_bytes": 214694604,
        "fixed_bit_set_memory_in_bytes": 0
      },

In this case more segments but less memory used. I suppose that memory in bytes correlated with doc count (and how unique your indexed fields is).

Maybe in your case some segments shares same terms or something else that duplicates in memory but belongs to different segments. After segment reduction same terms merged and so you have a little gain in memory. But it just my guess.