Too many properties: should we increase the property limit or use a nested approach and increase that limit?

Hi there,

We have a situation with limits in the mapping, and I am not sure what is the way to go as there are multiple solutions.

I will start by describing the use case:

  • there are multiple tenants, which each have their own Elasticsearch instance and a single index
  • there are multiple object types (currently ranging from 10-500 per tenant)
  • each object type has multiple attributes, unique to that object type (currently ranging from 10 to 2000 per object type)
  • per object type there is a fluctuating amount of documents, some only have 10-100, some have 500.000

In older Elasticsearch versions, this was done by creating a single mapping per object type, and the store each attribute to that mapping. But in the current versions there is only a single mapping.

We implemented this, by creating a unique property key, based on object type and attribute name, and store these in the single mapping of the tenant index.

This causes issues, as the total number of attributes in the mapping will exceed 1000.
I read the documentation about this, and it warns to not increase that limit as it will lead to issues and a mapping explosion.

On this board I found some info:

Using a flattened structure is not usable in our use-case as we need advanced filtering and querying. Splitting the index is also not that easy to do, as we don't know how many documents the object will have, and it is overkill to create a new index for only a hand full of documents. Even for the biggest object types this would be overkill. As most of the time there are 1 or 2 large ones and several small ones, but that is not something known beforehand.

I also read the blogpost: Too many fields! 3 ways to prevent mapping explosion in Elasticsearch | Elastic Blog but that does not help us either, as we actually need that many fields and cannot generate them on the fly.

I found another solution in this ticket: Limit of total fields [1000] exceeded - #17 by dadoonet

Currently we have implemented a proof-of-concept using nested fields:

  • we create a nested field for each object type
  • the nested field contains the properties of the object type

This seems to work, as you would think this limits us to a 1000 object types instead of 1000 attributes, and 1000 object types is not a real world scenario for us.

However, this hits another limit: by default you are only allowed to create 50 nested property types.

What would be best to do:

  • store each property on the main mapping, and increase the property limit from 1.000 to something like 50.000
  • increase the nested limit from 50 to something like 1.000

And what would be the performance consequences for each of these 2 solutions?

A typical average use case for a tenant is 100 object types, with 2000 total attributes and about 500.000 documents spread between these object types.

Thanks!

@obi-wan - Hello there! and welcome to the community!

This sounds like a fun use-case!

Can you elaborate on this? Does this prevent you from using any of these attributes as a key/value mapping? For instance, maybe a small subset of attributes (50? 100?) become true fields, while the rest of the attributes can fall into a key/value schema which can still be filtered and queried upon.

Does this mean you have to recompile/deploy your app when a new attribute is introduced?

Splitting into multiple indexes can always be a challenge. Are there any data points at the object-type level that you can leverage to perform the split?

Given you're potential scale - be cautious Nested Field's performance.

What does your ingestion process look like? Is something constantly updating the originating documents with new data? or more of an immutable/append-only store?

Hi @eMitch , thanks for replying!

I will elaborate a bit more on the structure.
These object types are things like: PRODUCT, SUPPLIER, LOCATION, CUSTOMER etc. (but can be anything the user wants to create, I choose these as easy to work with examples). Something like tables.

These can have attributes of different types, which also are configurable on the fly by the user.
Think of Strings, Booleans, Numbers, Dates.
Besides that, it is also possible to configure relations between the object types, these are then stored in a property with the relation name, which contains an embedded object with ID & display value.

Elasticsearch is used as an index on top of a database. The user can configure the schema of the database on the fly. When he adds a new object type, or adds or removes a attribute on the object type, the application will convert that to the Elasticsearch mapping and update the mapping.

When entries are created in the application (by automation, or manual by a user) these are stored in the real database. Then when this commit succeeds, it will trigger another application which will retrieve the data from the database, do some conversions (to make it better searchable etc), and then store the document in Elastic using an upsert.

All attributes in the mapping, are unique for the object type. For example, ITEM can have a field ID, and SUPPLIER can have a field ID as well, which should not be stored as the same property as they can be independently modified in the application. This would lead to a property in the mapping called ITEM_ID and SUPPLIER_ID to make these unique.
We also introduce our own type property, so that we can register on a document of what object type it is. In our queries we do a match on the object type, and then do the actual query filtering (like ID must be 1, PRICE must be > 100 etc.). The user must be able to filter and query on each field of the object type individually, but also combine them in advanced and/or/not constructions.
I thought about this for String fields, but we also need to be able to do wildcard searches etc. on Strings, and I read that is a limitation of flattened fields.

No, the user can edit the schema on the fly. When he does, the mapping gets updated in elastic. It is not allowed to change existing properties when there is data, only to remove or create new properties, and they will all lead to unique names as it is not allowed to re-use a property name when there was data.

The most logical would be splitting on object type, as that would group all data that is related to each other, but that would lead to many useless indices as some might only contain dozens of documents. I think the multiple index idea is not something of use for us, but it was one of the options

Say we have 500 object types, with each 50 unique attributes. That would require us to up the nested property limit to 500. We would store each object type as a nested property, and the actual attributes as properties of that property.
A document can only be of 1 object type, so that would mean that each document would only have 1 nested type filled. That would still mean that each document would become 2 documents, one for the "main" document that only points to the "nested" document which contains the data.
Would this introduce a huge performance impact? Is this approach worse than keeping everything on the main level and upping the property limit from 1.000 to say 10.000?

Mixed, depends on the use case of the user as its very flexible. Mostly updating existing properties on existing documents with new values. Sometimes remove documents, and sometimes create new documents. It would modify 10% of the documents daily at max (most users do less).

Also, after some more local testing, it seems the information in Limit of total fields [1000] exceeded - #17 by dadoonet is incorrect.
For example take ITEM, which has PRICE, ID and NAME.
When I store them on root level, we get ITEM_PRICE, ITEM_ID and ITEM_NAME, which are seen as 3 properties.
When I store them on nested level, we get ITEM, ITEM.ITEM_PRICE, ITEM.ITEM_ID and ITEM.ITEM_NAME, which are seen as 4 properties. Is it supposed to work like this or am I doing it wrong?

My bad, I think I did something wrong and did not fully understand what the post meant.
The whole idea is to make the property names dynamic right, so use it as a dynamic key/value approach instead of storing the attributes fixed on the mapping.

This is something that might work, as we have about 15 different data types in our application. All the different attributes on the object types are of one of these data types. So that would require about 15 nested properties (below the 50, which is nice).

We would then have 15 properties on the mapping for the different types, and 1 extra property to show the object type of a document.

And then index the document, group by attribute type, and the value of the property would be a key-value where the key is the actual attribute name and the value is the correct value.

Would this approach even require nested fields? I think if we don't specify them as nested, we loose the coupling between key and value right? And flattened only works with Strings and basic filtering?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.