We're using elasticsearch to analyze/visualize social media data.
We're currently limited in how much data we can put on a server, because of field data. Most fields are integers or non-analyzed strings, for which we use docvalues. But a few analyzed text-fields (header/body) can not use docvalues, and thus always load the fielddata. (We use the fielddata of these text-fields for generating tagclouds with significant_terms aggregation)
This fielddata limits us to putting around 100m messages per server, which means we quire regularly need to purchase more servers.
The servers are not doing much with CPU/disk , it's mainly memory-bound.
So far i had these ideas:
- Put more memory per server (now we have 128Gb, we could go to 256Gb)
- Disable / remove fielddata for older data (meaning tagclouds become unavailable there)
I'm hoping for more ideas to let us put more data on a server, e.g. optimizations or other tricks.