Unused mapping fields and impact on space/performance

sparkjonez · January 28, 2019, 8:21pm

Hello everyone,

I have fields in my mapping file that are not being used by the incoming data. Are those (unused) fields taking up space or impact performance in my Elasticsearch cluster, or is Elasticsearch smart enough to not allocate space for a field until there is data to occupy that field?

I am using Elasticsearch version 5.6.9.

polyfractal · January 28, 2019, 10:46pm

The answer is complicated, but generally you'll "pay" some overhead for those empty fields, but not as much as you'd expect from allocating x bytes per missing field (with one exception, described at the end)

E.g. Lucene has a variety of ways to determine how sparse/dense a field is, and adjust it's encoding. A simple example is if there's a field which only has one value in one document. You could write that value, then encode a bunch of null results for the rest of the documents. Or, you could just make a note of the single document that has the value and nothing else.

Another simple example: all the fields have a value, but they are all identical (one distinct value). You could rewrite the value over and over, or just make a note all value are shared and record it once.

Those are trivial examples, but give you an idea of some optimizations (at a high level) that lucene does. It's hard to say how much overhead you pay for the empty fields in documents, but it's somewhere between "none" and "all"

A related blog post was written about filter caching here: https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps

The only exception to the above is if the field is "mapped" (It's in the mapping) but none of the documents actually use that field. In that case there is no overhead... nothing is allocated if none of the documents actually use the field (mostly, there's some minor overhead at the mapping level but not at the data level).

sparkjonez · January 28, 2019, 11:24pm

Thank you, Zachary. This is very helpful.

Regards - T.

system · February 25, 2019, 11:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extra Fields Elasticsearch	3	676	July 5, 2017
Removing unused fields (more Lucene than ES but..) Elasticsearch	7	933	July 6, 2017
Representation of non-existent field Elasticsearch	3	1086	July 5, 2017
Field count v. performance Elasticsearch	11	419	April 15, 2024
12K fields in the mapping Elasticsearch	7	487	February 8, 2022

Unused mapping fields and impact on space/performance

Related topics