Reducing Index Footprint

Hi there!

My team is looking for ways to optimize large elasticsearch indices by reducing the index footprint.

I read this article about Dynamic Mapping and was wondering if it can be used in order to reduce index footprint (I know that this is also the default mapping configuration in ElasticSearch).
By using Dynamic Mapping, we add certain fields only to documents that really use it (rather than adding it to all documents).
For cases where most documents don't use a certain property we can potentially reduce index size significantly, right?

So if - for example - we may have a basic document mapping that includes "first_name", "last_name" and "main_email". However, some documents (5%) may also include another field "additional_email".
So why have 4 fields for all documents where we can same quite a lot of space using dynamic mapping (95% of the documents need only 3 fields), right?

Also, using Dynamic Templates we can ensure that the mapping type for dynamically added fields is predefined rather than automatically selected by ElasticSearch (used for "additional_email" in our example above).

Is this use-case valid?
Is this really a way to reduce index footprint?
Are there other effective ways to reduce index footprint?

Thanks!
Shmoolik

I don't really know about Dynamic template but you can use logstash to filter out the field that you don't need. But if you are not using logstash then sorry i can't provide any thing helpful.

The way i check are if there are data in the field additional_email then index, if not then remove the field, quite easy actually.

Thanks @lusynda for your reply!
We're not using Logstash so this doesn't seem to be relevant.

Would love to get an "insider" answer for this, since I guess this is more of an elasticsearch internal implementation question... Can Dynamic Mapping potentially reduce index footprint.

The mappings and index settings you use will affect the index size. Dynamic mappings just instructs Elasticsearch how to map new fields so does in itself not affect size although the mapping it is configured to apply does.

Thanks a lot for your answer and for the link @Christian_Dahlqvist!

I'm not sure I completely understand - please let me rephrase my question using an example.
Given that I need elasticsearch to perform indexing on all fields, please consider the following use-cases:

Use-case 1
We use static mapping and push 1M documents.
The static mapping defines that each document has the following fields: "first_name", "last_name" and "email".
The index footprint is now X.
Now, we insert 1 more document.
It is understood that the index footprint will be X+1 (no re-indexing is required).

Use-case 2
We use dynamic mapping and push 1M documents.
Each of these million documents was inserted with the following fields: "first_name" and "last_name" (and only these fields).
The index footprint is now Y.
Now, we insert 1 more document that has the "first_name" and "last_name" fields, but adds also a new field "email".
How will this affect the index footprint?
Will the index footprint increase only by 1 document, or will all 1M documents be updated (re-indexed) to include the new "email" field (with null value)?
Given that all documents from both use-cases are the same, which one is larger in size - the last footprint of use-case 1 or the last footprint of use-case 2?

Elasticsearch supports sparse documents and only indexes the fields that are present in the document. Whether you use dynamic or static mapping here is irrelevant.

Thank you @Christian_Dahlqvist for your prompt response.
It is much clearer now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.