Elastic Mapping explosion


(Nikesh) #1

Hi,

I have few questions related to number of fields present in mapping and addition of new fields dynamically.
What can be the causes of mapping explosions?
Is it the high number of fields(in my case more than 1000 fields) present in the mapping file or huge number of documents present?
Is there any other reasons for mapping explosion?


(Mark Harwood) #2

Your source of data.
An example - getting the keys and values of something like "customer_id": "N343242394638" the wrong way round in your application code would be a good way of generating a lot of unique field names from the keys, all with the same customer_id value string.


(Nikesh) #3

@Mark_Harwood Thanks for the response
It is understandable from your reply to avoid generating lot of unique fields names unnecessarily.
But in situations where in, it is not possible to avoid any fields and the number is over 1000, how can we prevent mapping explosion?
Is there any other reasons that can cause mapping explosions?
Also, What can be the consequences of mapping explosions?


(Mark Harwood) #4

By carefully controlling what JSON you pass or, if you can't, by declaring what your indexing policy is for any new fields - ignore, accept or error?

If you're not interested in searching or aggregating certain fields that may appear in your docs you can simply choose to ignore them in your index mappings. They'll still exist in the stored JSON blob but won't be unpacked and added to any kind of index or doc-values storage.

Anything that can introduce new fields into the provided JSON.

Elasticsearch rejections because you exceeded the permitted number of mapped fields. Each mapped field comes with overheads (disk + RAM) so it shouldn't become an unbounded collection.


(Nikesh) #5

How/Where is mapping stored within Elastic? How much of overhead does it cause on disk and RAM?


(Nikesh) #6

Thanks for quick replies
To add to my previous doubt,

  1. Is there a fixed value for number of fields that has to be stored? I see the default value is 1000 fields. But I have a situation where I have to store 1500 fields.
  2. Is there any alternative to this mapping explosion prevention?

(Mark Harwood) #7

In the "cluster state" which is shared with every node.

A small part of the overhead is fixed (the set of fields definitions in cluster state) and the larger part varies with the number of documents in the index. More fields = more entries in the search index data structures and RAM-based caches.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings
Example use here


OutOfMemory error
(Nikesh) #8

Thanks for the response.
I have gone through the provided links, I understand that default limit of 1000 fields.
I have 2500 static fields specified at the time index creation. Is there a specific number of static field that can cause mapping explosion ?


(Mark Harwood) #9

No, the same way there isn't a specific number that causes "a large crowd of people".


(Nikesh) #10

okay Thanks for the reply.
An add on to my previous question,
If there are 2000 fields in my mapping file but the number of documents I am indexing is low (50,000).
Can with such low data a mapping explosion occur?


Elasticsearch - Is there any certain number for mapping explosion
(Mark Harwood) #11

We may be talking at cross-purposes.
I don't think of "a mapping explosion" as a specific error or event.
I think of it as a general condition of having a lot of fields.

It's a condition that can lead to a number of problems (memory pressure, delays publishing cluster state..) and is the reason we introduced a soft-limit to the number of fields in mappings.

If there are 2000 fields in my mapping file but the number of documents I am indexing is low (50,000).

Sounds like a lot of fields for users to consider/search but shouldn't be too much of a problem.


Elasticsearch- Single Index vs Multiple Indexes
(Nikesh) #12

Thanks @Mark_Harwood for the response. I want to understand few more things. Can you please provide a link of things that can help me understand the following?
How does Mapping work internally within Elastic?
Is it referred to, for every search query?