How to implement multi tenant environment in Elasticsearch

What is the approach the Elasticsearch community recommends to use in a multi-tenant environment?

Is one index Approach good? what are the pros and cons?

Thanks,
Harshal

It will depend on the use case and specifically how many tenants you need to support and how similar the data is across tenants and how you need to manage retention for different tenants.

Thanks @Christian_Dahlqvist

Could you please let me know your thoughts on the below examples?

E.g.
Approach 1: 4TB of data with 1000 tenants - single index index per tenant
Approach 2: 4TB of data with 1000 tenants - 100 index per tenant

which could be a better approach for scalability and cost effectiveness.

Thanks @Christian_Dahlqvist !!!

Could you please let me know your thoughts on the below examples?

E.g.
Approach 1: 4TB of data with 1000 tenants to implement using a single index per tenant
Approach 2: 4TB of data with 1000 tenants to implement using 100 index per tenant

which could be a better approach for scalability and cost-effectiveness.

Is that 4TB of data in total or 4TB of data per tenant?

The main factors that drive how you handle multi-tenancy in Elasticsearch is however not necessarily just data volume. Could you please answer the following questions?

What type of data is it?

Is it the same type of data across all tenants? Does each tenant only have one type of data?

Do you have control over the format or is this up to the tenants?

Do all tenants follow the same data retention pattern?

How are the tenants accessing and querying the data?

Is that 4TB of data in total or 4TB of data per tenant?

It is 4TB data of all Tenant.

The main factors that drive how you handle multi-tenancy in Elasticsearch is however not necessarily just data volume. Could you please answer the following questions?

What type of data is it?

Plain text / JSON data of Configure, Price, Quote, Product, Customer

Is it the same type of data across all tenants? Does each tenant only have one type of data?

Every tenant has their own types of data (Plain text / JSON data)

Do you have control over the format or is this up to the tenants?

This is up to the Customer.
E.g. Single Tenant should have a different index for Product, Category, PriceListItem

Do all tenants follow the same data retention pattern?

Yes, all tenants follow the same data retention pattern

How are the tenants accessing and querying the data?

24,000 customers/tenants and over 5 million users are accessing 1.3 million documents every day.

Does the data have a fixed retention period, e.g. deleted after 3 months or never deleted at all?

That is not what I meant. Do customers have direct access to the Elasticsearch API? Do they query data through an API that you control? Are they using Kibana to browse data?

How many different types of data can a tenant have?

That is not what I meant. Do customers have direct access to the Elasticsearch API? Do they query data through an API that you control? Are they using Kibana to browse data?

Yes we have written our own APIs that make queries using Elasticsearch client.

How many different types of data can a tenant have?

Single tenants have 100-200 types

Does the data have a fixed retention period, e.g. deleted after 3 months or never deleted at all?
Yes, all tenant has the same retention period.

What is the retention period?

When tenants query data are you just filtering or do tenants perform free text searched where relevancy is important?

Do tenants query across multiple types of data in a single query?

Elasticsearch does generally not scale or perform well with indices per tenant in a multi-tenant situation, al least not unless the number of tenants is reasonably low. Having lots of very small indices and shards adds overhead and is inefficient. The default limit of 1000 shards per node is there for a very good reason, and it is to prevent users from oversharding. When handling a large number of tenants like in your scenario it is therefore common to group tenants into one or more indices.

As you have an API tier you can add a filter on a tenant ID field in your documents in order to prevent one tenant from seeing another tenants data.

The second, and possibly most important, aspect to multitenancy is mapping management. Elasticsearch is not schema-fless, so if tenants can ingest data in any form they like you are likely to encounter mapping conflicts where new data can not be indexed. Due to this it is often common to group data based on the type of data being indexed. If you e.g. knew that tenants could each have 200 types of data and that these types were the same across tenants with no schema conflicts, you could create one index per data type and have all tenants share that index for that type of data. As described earlier you would manage access and filtering in your API tier.

The amount of data does not really determine how you organise your data and tenants, but rather how you size the different indices with respect to primary and secondary shards. This is why I was asking questions about the tenants and the data types.

1 Like

thank you @Christian_Dahlqvist for the information.

Could you please suggest in the above use case what approach should I go for

1/ Shall I create an Elasticsearch index Per type, per index, per tenant?
E.g. For 24000 tenants, 200 types

shall I create a 24000*200 = 4800000 index in Elasticsearch? is it feasible?

If the type of data is the same across tenants I would recommend creating one index per type, e.g. 200 indices in total, and have all tenants share these indices. You may group tenants into indices but creating indices per tenant will likely not work nor scale or perform.

This does however depend on the nature of your data, which we know nothing about.

No. That is not feasible.

Really Appreciate @Christian_Dahlqvist for your time and information. Thanks again.

Regards
Harshal

Yes, in some use cases.

@Christian_Dahlqvist is there any alternative way, I did not want to keep multiple tenant data in a single type index.

Having an index per tenant does, as I mentioned earlier, sometimes work as long as the number of tenants is low. With a large and potentially growing number of tenants this approach does not work in Elasticsearch. This has come up numerous times before and you can search the forum and see that the advice is the same.

One way I have seen suggested in the past, but which Iwould not recommend, is to have many separate small clusters that each host a manageable number of tenants.

Given that you have multiple data types with potentially conflicting mappings I believe it makes sense to group indices by data type rather than tenant in this case.

What are the aecurity requiremenes? Limiting user access to the appropriate data within a shared index can be done with document level security.

24000 tennents is not trivial for a new elastic designer.

Thanks, @rugenl , could you please suggest an appropriate solution? as per

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.