Lowering memory consumption

We're using elasticsearch to analyze/visualize social media data.

We're currently limited in how much data we can put on a server, because of field data. Most fields are integers or non-analyzed strings, for which we use docvalues. But a few analyzed text-fields (header/body) can not use docvalues, and thus always load the fielddata. (We use the fielddata of these text-fields for generating tagclouds with significant_terms aggregation)

This fielddata limits us to putting around 100m messages per server, which means we quire regularly need to purchase more servers.

The servers are not doing much with CPU/disk , it's mainly memory-bound.

So far i had these ideas:

  • Put more memory per server (now we have 128Gb, we could go to 256Gb)
  • Disable / remove fielddata for older data (meaning tagclouds become unavailable there)

I'm hoping for more ideas to let us put more data on a server, e.g. optimizations or other tricks.

Maybe fielddata filtering could help in your case. Maybe you could configure your analyzed fields to only load terms that have a frequency of 2 per segment for instance? I am not entirely sure it works with significant terms however.

Thanks, i had not thought of that! I'll expriment with that.