Multitentancy modeling question


(Igor Berman) #1

Hi all,
I need advice regarding how to model data in ES:
we have multiple hundreds of customers(tenants), each tenant has aggregated data for some time range(one month, one year, one week etc). Tenants are of different sizes starting from almost 0 documents and reaching 100Mil and event more documents(there are ~10 big ones), so we are setting number of shards for each index starting from 1 and reaching 20 at max)
We build index per time range because it contains documents that show aggregated view of this time range(i.e. ES can't do this type of aggregation on the fly, so we prepare data to be aggregated already)

The question is - how to setup indexes for all customer x time-range.

Currently we have 1 index per customer per time range, which gives us flexibility to maintain each index separately.
However, reading the forum I came to the conclusion that ES doesn't like many indexes? What are common-solution for such scenario?
e.g I can think about 1 index per 1 time range with some routing strategy
we have rather small cluster(under 20), so shards of each index are distributed among them

Any advice will be appreciated!
Igor


(Christian Dahlqvist) #2

Having a very large number of small shards and indices can result in a lot of overhead and a large cluster state, and thereby limit scalability.

How to best address this depends a bit on the nature of the data and the resulting mappings. If the tenants have very similar or identical mappings, e.g. if the data is standardised, it is relatively easy to have many small tenants share a single index and as suggested use routing in order to ensure only a single shard per index need to be queried for each tenant.

If mappings across tenants are not controlled or uniform it gets trickier, as the risk of having mapping conflicts dramatically increase. In scenarios like this the solution can be less clean, and often involves standardising mappings and/or field names in some way. If this is not possible, an approach that I have seen is to divide the problem by having several small clusters rather than a single large one.


(Igor Berman) #3

Thanks Christian
yes, tenants have exactly same mapping
Is there a rule of thumb how many indexes is still ok to have ? of cause it depends on cluster size..but still if you can tell me some baseline


(Christian Dahlqvist) #4

This will depend on a number of factors, so it is difficult to give precise guidelines. As each shard carries a certain amount of overhead in terms of file handles, memory and CPU usage, you want your shards to not be too small. A target shard size of between a few GB to tens of GB is not uncommon in use cases with time-based indices on which aggregations are performed. The general guideline is however to try to keep the shard size below 50GB as larger shards can have a negative impact on recovery.

I would suggest possibly having the larger tenants in separate indices and adjust the number of shards to ensure the shards are a reasonable size. It may very well be possible to have all smaller tenants share a single index and adjust the shard count in order to control the shard size and make sure there is enough shards and replicas to distribute across the cluster.


(Igor Berman) #5

Thank you Christian for the insights!


(system) #6