Maximum number of fields in an index mapping

Hi,

We are using Elasticsearch 2.3. In this version, there is no limit on the maximum number of fields in an index mapping but the recommended limit is 1000 on ES 5. We want to know the reason behind it. We already have over 7000 fields in an index and are currently doing well. But we are expecting a huge increase in the near future. So,

  1. Could anyone point us to some documentation on why we shouldn't have too many fields in an index?
  2. What problems could occur because of this?
  3. What is the maximum limit on the number of fields in ES 2.3?

Thanks!

There are three main reasons:

Cluster state overhead: the mapping for each index is stored in the cluster state, which is shared among all nodes. Any change to the cluster state (such as adding a new field) causes the CS to update across all nodes. Very large mappings induces a non-negligible amount of data that needs to be sent over the wire and refreshed on all the nodes. It might seem unimportant, but having to periodically serialize a few mb of cluster state really adds up over time, and can add unwanted latency to regular actions.

Sparsity: Generally, people who have thousands of fields also tend to have very sparse fields. E.g. each document only has a handful of the thousand fields. This makes the data-structures stored on disk very inefficient (less so in newer versions, but still not ideal) because the data is so sparse. You tend to see this kind of behavior when ES is used as a blind key:value store, or where multiple tenants share the same index and are allowed to create whatever fields they want.

Lucene overhead: in short, having thousands of fields eats up a certain amount of fixed overhead

The limit is a soft-limit, so you can change it if you want. But it's there for a reason, namely that we think >1000 fields is starting to get abusive and we'd recommend trying to pare down your fields with some kind of alternate scheme. :slight_smile:

3 Likes

Thanks for your answer! I have couple more questions.

Cluster state overhead: Is there a way to figure out the maximum cluster overhead (in terms of size or number of fields) that we can tolerate for a particular cluster size? We closely monitor various ES metrics and latencies of all our regular actions but do not see any huge latencies as of now. Will there be a sudden tip over if cluster overhead crosses some limit?

Lucene overhead: Can you provide more information on this? Is there a way to track this overhead as mappings size increases?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.