How to implement multi tenant environment in Elasticsearch

Elasticsearch does generally not scale or perform well with indices per tenant in a multi-tenant situation, al least not unless the number of tenants is reasonably low. Having lots of very small indices and shards adds overhead and is inefficient. The default limit of 1000 shards per node is there for a very good reason, and it is to prevent users from oversharding. When handling a large number of tenants like in your scenario it is therefore common to group tenants into one or more indices.

As you have an API tier you can add a filter on a tenant ID field in your documents in order to prevent one tenant from seeing another tenants data.

The second, and possibly most important, aspect to multitenancy is mapping management. Elasticsearch is not schema-fless, so if tenants can ingest data in any form they like you are likely to encounter mapping conflicts where new data can not be indexed. Due to this it is often common to group data based on the type of data being indexed. If you e.g. knew that tenants could each have 200 types of data and that these types were the same across tenants with no schema conflicts, you could create one index per data type and have all tenants share that index for that type of data. As described earlier you would manage access and filtering in your API tier.

The amount of data does not really determine how you organise your data and tenants, but rather how you size the different indices with respect to primary and secondary shards. This is why I was asking questions about the tenants and the data types.

1 Like