We're approaching 12K fields in our mapping. Our cluster does not have many documents (it's about 1 billion, and it's doc one per user, so it's not growing very rapidly).
I'd like to know: how much further can we grow the number of fields before reaching an unusable stage? Which metrics we'll see starting to degrade?
Another question: how can we redesign our cluster once we hit this limit? We are trying to avoid partitioning our customers to several clusters/indexes just to keep the mapping size low.
The reason is that the index serves a lot for customers where each can define their own set of fields (up to 100) and we used a dynamic index for this.
Indeed, mapping explosion is a valid concern, it takes heap memory; it is a part of the cluster state that gets synchronized between nodes, and the bigger the cluster state, the slower synchronization can me. Also a size of mapping affects the speed of indexing of documents, because as you index a new document, this document's mapping is checked agains the existing index mapping.
For an index with many many fields, we recommend using ES flattened field. It has some limitations for search, but from the ES point of view it is a single field with all its subfields with their names and values are stored on disk, which allows to have "unlimited" number of subfields.
it depends on what kind of queries you are using.
If you filter or search on that field, keyAndValue may not be a good option in most cases and flattened field is better.
If you filter or search only on other fields and just store and retrieve the data, you can use object field with enabled: false.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.