I have a product that uses ElasticSearch (ES) primarily to improve search performance. It’s currently running version 2.3.1, but we are planning on moving to version 7.0.0 for a variety of reasons, not the least of which is security and support.
In doing this, I have been looking into the current structure of the ES indices for product, and noticed that we are utilizing mapping types heavily. I’m looking for some guidance in terms of best practices for index creation.
The current scheme for creating indexes is as follows:
- Create an index for every domain (tenant) in a given environment, named domain_{domainId}
- Create a mapping type for every item type
- Asset
- Event
- ... (there are 12 types total)
My assumption is that this structure was created to segregate data by tenant to help limit cross-domain data contamination (since all data queries would implicitly be against a single domain) as well as support the notion of custom fields, where an Asset for Customer A might have different custom fields than an Asset for Customer B (the same core fields, but different custom fields). With the removal of mapping types, however, it raises some concerns on my side, as the options to deal with this have some issues.
Options
Based on the documentation in Elastic, there are two options:
Index per document type
If we wanted to index per document type, we could do one of two things
An index for each document type
We would create indices based on the mapping types above, and every index would have a domainId field that would have to be a part of every query. This would require some code change to ensure that domainId was included in each query to prevent cross-tenant data access. Also, I’m not sure how ES would deal with allowing an Asset to have custom fields that different across items (some assets might have “field1”, while others have “field2”).
An index for each domain and document type
We could create an index for every Domain/Type combination. In other words, instead of domain_{domainId}/{type} , we’d use domain_{domainId}_{type} . This would create twelve indices for every tenant, and some environments have 100+ tenants, which means we will have a lot of little indices in some of these instances.
Custom Type Field
If we utilize the custom type field suggestion in ES, we could keep the domain-level index ( domain_{domainId} ) and replace the mapping type with a custom type. This would require some modification of the ID in the index (since you could have ID collisions between objects with different types). I am also not entirely sure how (if at all) it would affect search performance.
The initial implementation was done by folks who are no longer here, so I'm just casting a wide net to the community to see what others experiences have been. My current inclination is to have an index for each object type (asset, event, etc), and utilize a domain_id field for tenant data segregation. My concern with this approach is how elastic would handle storing and querying on custom fields.