I'm attempting to integrate elasticsearch into a multitenant web
application. I have data segmented into tens of thousands of
'tenants', and then further subdivided by user within a tenant. I'd
like to make it so that my users can readily access any data within
their tenant, with optional visibility rules allowing finer grained
sharing (for instance, sharing certain types of data with other users
in the tenant, while retaining exclusive access to other types).
Towards this goal, I'm trying to figure out the best way of indexing
my documents within ES. My initial impulse was to create an index for
each tenant, but some cursory research indicated this was a Bad Idea.
Maintaining tens of thousands of indexes while adding more every time
a new tenant is created is almost certainly untenable. I'm stuck,
therefore, trying to decide what criteria to use when creating
indexes. I have a few ideas, mostly centering around heuristic data
such as geographic location, number of active users and so forth, but
nothing jumps out as the obviously best course of action. Though,
regardless of how many indexes I'm running and how I'm determining
which data to index in each, it seems like routing documents based on
the tenant id would be ideal for my needs. Can anyone offer some
advice on what kind of indexing strategy to employ for this type of
Some additional information that might be relevant:
Each tenant/user has the same types of data to index, but there may
be differences in how each type is mapped. That is, a type might have
some fields for one user, and others for another, and may need to be
tokenized/analyzed differently for both. This seems to indicate that
establishing different indexes based on different type mappings may be
the way to go, but I doubt there are enough such differences to
warrant more than a handful of different indexes. Is there any
performance hit associated with putting vast amounts of data into a
small number of indexes, assuming a per-tenant id routing strategy?
Almost all queries will need to be filtered by tenant, by user, or
by some combination of visibility rules. That said, some users need
the ability to query across all tenants, but the performance of such
queries need not be as high.
I'm using MongoDB as my data store, and see a fairly obvious one-to-
one mapping of Mongo Collection to ES document type.This suggests that
using types as a way of dividing data within an index by tenant might
not work, since I will likely need to use the types for collection
Any advice on this issue is much appreciated.