Reduce lucene segment heap memory signature

Roman_Margolis · December 8, 2016, 8:46am

I have a 12 node cluster + 3 masters, running ES 2.3.1, with 5000 shards cluster wide, about 410 shards per node, roughly 250 billion small documents.
Shards are roughly 8GB in size, well balanced (small variance of shard size in all the indices).
Shards are also optimized to 6 segments.
Data nodes are with 64GB main memory, with 31GB allocated to ES.

Currently, when the cluster is idle, heap consumption in every data node is around 20GB (out of 31GB).
I believe almost all the memory is used for the segment in-memory needs, because query and field data caches are empty and according to _segments, every shard is using about 50MB of memory_in_bytes, so 50MB * 410 = 20GB.

doc_values are disabled for all indices to save space, and the cluster is used for non aggregation queries only.

Is there a way to control memory_in_bytes of lucene segments?
I dug up some old posts about termInfosIndexDivisor, but they seem to be irrelevant today....

Thanks in advance

Christian_Dahlqvist · December 8, 2016, 10:21am

Not using doc_values where possible will generally drive up heap usage quite a bit. As your shard count and average size seems quite reasonable I would suspect this might be what is causing problems.

Roman_Margolis · December 8, 2016, 10:38am

The field data cache is empty, also no aggregates or sort queries are executed on the cluster.
Only simple queries with simple filters.

if that's what causing high heap consumption, should'nt that affect the field data cache?

warkolm · December 8, 2016, 7:30pm

I'd start by reducing the shard count, do you really need them to be that small?

Roman_Margolis · December 8, 2016, 10:24pm

They are small because the documents are small, but they do contain approximately 50 Mil documents each, providing excellent latencies for expected workloads over sata and sas storage disks.

If i were to reduce the shard count by 2, for example, i suspect latency would increase.

But even if i were to do that, why would that reduce the heap memory usage? analyzing the results i get from the _segments api, i can see a clear linear correlation between segment document size to its memory_in_bytes metric. So if my shards were to grow to 100 Mil documents, i would expect the segments to grow as well.

Finally, I expect the cluster to grow in a steady rate of approximately half a billion documents a day, uniformly across the cluster, so the shards should double their size in about a year and a half.

I should probably mention that we do not use time based indices, but rather a specialized partitioning scheme that benefits our most common workloads.

system · January 5, 2017, 10:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What does "Indices Lucene memory" mean? Using a large amount of heap space Elasticsearch	17	6551	July 5, 2017
Signs of too few shards? Number of documents per shard? Elasticsearch	3	533	July 23, 2021
How to reduce terms_memory_in_bytes? Help Help Help Elasticsearch	20	3079	July 5, 2017
Open index management and segment memory exhaustion Elasticsearch	2	547	July 5, 2017
Memory usage of the machine with ES is continuously increasing Elasticsearch	16	7266	July 5, 2017

Reduce lucene segment heap memory signature

Related topics