Hi @eMitch , thanks for replying!
I will elaborate a bit more on the structure.
These object types are things like: PRODUCT, SUPPLIER, LOCATION, CUSTOMER etc. (but can be anything the user wants to create, I choose these as easy to work with examples). Something like tables.
These can have attributes of different types, which also are configurable on the fly by the user.
Think of Strings, Booleans, Numbers, Dates.
Besides that, it is also possible to configure relations between the object types, these are then stored in a property with the relation name, which contains an embedded object with ID & display value.
Elasticsearch is used as an index on top of a database. The user can configure the schema of the database on the fly. When he adds a new object type, or adds or removes a attribute on the object type, the application will convert that to the Elasticsearch mapping and update the mapping.
When entries are created in the application (by automation, or manual by a user) these are stored in the real database. Then when this commit succeeds, it will trigger another application which will retrieve the data from the database, do some conversions (to make it better searchable etc), and then store the document in Elastic using an upsert.
All attributes in the mapping, are unique for the object type. For example, ITEM can have a field ID, and SUPPLIER can have a field ID as well, which should not be stored as the same property as they can be independently modified in the application. This would lead to a property in the mapping called ITEM_ID and SUPPLIER_ID to make these unique.
We also introduce our own type property, so that we can register on a document of what object type it is. In our queries we do a match on the object type, and then do the actual query filtering (like ID must be 1, PRICE must be > 100 etc.). The user must be able to filter and query on each field of the object type individually, but also combine them in advanced and/or/not constructions.
I thought about this for String fields, but we also need to be able to do wildcard searches etc. on Strings, and I read that is a limitation of flattened fields.
No, the user can edit the schema on the fly. When he does, the mapping gets updated in elastic. It is not allowed to change existing properties when there is data, only to remove or create new properties, and they will all lead to unique names as it is not allowed to re-use a property name when there was data.
The most logical would be splitting on object type, as that would group all data that is related to each other, but that would lead to many useless indices as some might only contain dozens of documents. I think the multiple index idea is not something of use for us, but it was one of the options
Say we have 500 object types, with each 50 unique attributes. That would require us to up the nested property limit to 500. We would store each object type as a nested property, and the actual attributes as properties of that property.
A document can only be of 1 object type, so that would mean that each document would only have 1 nested type filled. That would still mean that each document would become 2 documents, one for the "main" document that only points to the "nested" document which contains the data.
Would this introduce a huge performance impact? Is this approach worse than keeping everything on the main level and upping the property limit from 1.000 to say 10.000?
Mixed, depends on the use case of the user as its very flexible. Mostly updating existing properties on existing documents with new values. Sometimes remove documents, and sometimes create new documents. It would modify 10% of the documents daily at max (most users do less).
Also, after some more local testing, it seems the information in Limit of total fields [1000] exceeded - #17 by dadoonet is incorrect.
For example take ITEM, which has PRICE, ID and NAME.
When I store them on root level, we get ITEM_PRICE, ITEM_ID and ITEM_NAME, which are seen as 3 properties.
When I store them on nested level, we get ITEM, ITEM.ITEM_PRICE, ITEM.ITEM_ID and ITEM.ITEM_NAME, which are seen as 4 properties. Is it supposed to work like this or am I doing it wrong?