We are evaluating Elastic Search as a part of a large multi-tenant
system. We currently have about 8000 tenants and hope to grow to
around 20-30k in the next few years. Each tenant would have between a
few thousand and 10M+ documents with an average of roughly 500K
documents. The documents are all fairly small (think 5kb or less) in
size and most of them are basically exported database table rows. We
have multiple base document types common to all tenants and we want to
allow tenants to customize their types and even create new ones. What
would be the best way to potentially leverage multiple indexes and
multiple types to manage these documents? Originally, I had thought
one index per tenant and one Elastic Search type per type in the
tenant's data model, but there's been some traffic on this list that
makes me think that approach would be problematic. It sounds like the
overheard of one index per tenant would be too much for Elastic Search
for thousands of tenants? It also sounds like types with the same
field name could result in unpredictable query results since they are
in the same Lucene index?
If I'm correct on those last two assumptions, do I need to add a
tenant id field to every document and keep one index? It concerns me
that the index would be so large and querying speed would degrade
significantly. I'm also unsure how to handle tenant customization.
What if one tenant wants to add a field called "foo" to one of their
types and another wants to add a field called "foo" to a different
type or they want to add them to the same type, but have the fields
analyzed or typed differently?
Sorry for all the questions, I'm just hoping to start a discussion,
and get a sense of how many potential areas of concern there are with
our plan. If you feel strongly that Elastic Search is not right for
this problem, please say so.