We have a heterogeneous cluster setup where a large set of powerful hot data nodes ingest daily data live, and store about 5 days worth of data (~15tb or so). Then after 5 days we migrate the data to a smaller set of less powerful but larger-in-storage warm nodes. These warm nodes maintain data for 25 days before we destroy it.
Our warm nodes are d2.2xl instances (60gb memory, 12TB local storage) and we dedicate half the memory to ES (so 30gb). Each node is storing roughly ~5tb of data, and we have 14 nodes right now. Over 14 nodes we have 25 days of indices (* 2 because we store two types of data each day). This nets out to about 700 shards and about 70tb of raw data across the machines. From a performance perspective this is fine because this data is generally not searched frequently, so we don't mind if the queries are slow.
The problem we've got is that the JVM on each of these machines is running right at about 80% used, and there is no garbage collection jigsaw pattern .. the machines just hold steady at 80%. After digging, we realized that it looks like we're storing an average of 22gb of data in memory for terms:
So, the ultimate question here .. is it possible for me to tune ES to store fewer terms in the JVM heap? Or is this a simple limitation .. the more data we have, the more heap size we simply have to have available to us? Can we not make some kind of performance tradeoff here?
What does your mappings look like? Do you have a lot of analyzed fields? Are you using doc_values for not_analyzed strings (which is the default in ES 2.4.2)?
Our mapping is pretty dynamic .. and indeed we have a ton of fields. On an average day, we generate probably 700-1000 unique fields on a given index. While I can't share a full actual generated mapping for an index, here's our template:
It looks like the vast majority of string fields are dynamically typed, which will result in one analyzed and one not_analyzed version. You also have the _all field enabled, which is an analysed field. Depending on the number of string fields, this is likely to be behind your field data heap usage.
The best way to reduce heap usage is probably going to be to go through your mappings and map fields that only really are used in a not_analyzed capacity as such.
@Christian_Dahlqvist, thanks for the feedback. I do understand that we are analyzing almost every field, but I definitely thought that had more of an impact to disk usage and cpu time (at-indexing-time) rather than long term sustained memory use. Out of curiosity, is there a way for us to tell how much heap is being used by a particular index? We can certainly try an experiment where we stop analyzing all the fields and instead only analyze a few specific ones ... but I would like a measurable way of seeing the difference.
@Brad_Jungsu_Heo, we have not yet solved the issue other than by throwing more hardware at those nodes .. and ultimately we may choose to either continue with that solution or reduce the total number of days that we store. Re: the image, we use Datadog for all of our monitoring here, so those graphs were built with Datadog.
In my case, after merging segments into 1 segment using force merge, segments.terms_memory was decreased by 30%. (note that this percentage depends on your index characteristics)
I hope segment merging help you prevent adding more nodes (or resources, or money).
Thanks for that feedback -- we're still in the process of merging a ton of our old segments, so I'll check the memory usage when we've finished that process.
We are going to begin a process of going through and starting to tune our mappings to be more specific rather than being wildly general purpose. This is gonna take some time though, but I'll post our results here when we have em.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.