I am trying to build an elasticsearch cluster(3 master nodes, 2 data nodes to start with, but can be expanded up to 20 data nodes) capable of hosting a large number(4000-4500) of tenants who will index documents that is of different types. .A large number of the tenants(say 4000) is expected have only 1 or 2 types of documents to index
where as a small number of them(say 500) will have up to 5 types of documents to index.Average size of each document is around 1 KB and could contain upto 30-50 fields.
What is the best approach ?
Approach 1: One index(say 1 primary shard, 1 replica) per each document type per tenant.
Approach 2: One index (say 1 primary shard, 1 replica) per tenant. Use elasticsearch type to represent each document type(ie, 1 index will have multiple types).Mandate that same named fields in different document types within the same index needs to be of the same type.
Approach 3: Dynamically create indexes(say 5 primary shards 1 replica, each index hosting up to 50 types) that could host types from multiple tenants.Use an internal field naming strategy(in my application layer) to ensure that fields(in document types) from different tenants are uniquely named.Use appropriate routing to ensure that documents from the same tenant goes to the same shard.
I do not use any parent/Child relationships yet.
The issues that i see with each of the approaches are below.
Approach 1: This will create a cluster that needs to handle up to 21000(4000*2(shards)2(types)+5005(types)*2(shards)) shards.However, given that each shard consumes resources, is this the right approach?
Approach 2: Slightly better than approach 1 given that all document types from a tenant goes into same index. Our cluster will still need to handle upto 9000(40002(shards)+5002(shards)) shards.However,
https://github.com/elastic/elasticsearch/issues/15613 indicates that types might go away in future. Given that, is this the right approach?
Approach 3: Given that 50 types goes into a single index, shard requirement for the cluster is down to 2100 shards.However, if elasticseach doesn't allow mutiple types(in the same index) in future, i might be pushing myself into a corner?
Which approach do you think is the best? Or are there other approaches that would allow me to satisfy my multi-tenant use-case?