I have fields in my mapping file that are not being used by the incoming data. Are those (unused) fields taking up space or impact performance in my Elasticsearch cluster, or is Elasticsearch smart enough to not allocate space for a field until there is data to occupy that field?
I am using Elasticsearch version 5.6.9.
The answer is complicated, but generally you'll "pay" some overhead for those empty fields, but not as much as you'd expect from allocating
x bytes per missing field (with one exception, described at the end)
E.g. Lucene has a variety of ways to determine how sparse/dense a field is, and adjust it's encoding. A simple example is if there's a field which only has one value in one document. You could write that value, then encode a bunch of
null results for the rest of the documents. Or, you could just make a note of the single document that has the value and nothing else.
Another simple example: all the fields have a value, but they are all identical (one distinct value). You could rewrite the value over and over, or just make a note all value are shared and record it once.
Those are trivial examples, but give you an idea of some optimizations (at a high level) that lucene does. It's hard to say how much overhead you pay for the empty fields in documents, but it's somewhere between "none" and "all"
A related blog post was written about filter caching here: https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
The only exception to the above is if the field is "mapped" (It's in the mapping) but none of the documents actually use that field. In that case there is no overhead... nothing is allocated if none of the documents actually use the field (mostly, there's some minor overhead at the mapping level but not at the data level).
Thank you, Zachary. This is very helpful.
Regards - T.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.