Currently we have our data in MongoDB and we will continue to keep it there for ACID compliance. However we are moving our search capabilities to Elastic. Since our app supports multi-tenancy our MongoDB architecture is such that we have 1 instance per tenant. After we move to elastic can we partition the data by tenant and also in each tenant shard by our application specific shard_key? or we need to have both tenant id and app key id as shard key?
Another reason for asking this is also because say I have 200 odd tenants approx and if I sum up the data storage for all these tenants(FYI: 1 customer can have multiple tenants so these 200 odd tenants may represent like 50 unique customers if I have say 4 tenants max per customer) this data was coming up huge like 1 TB. And based on this Size your shards | Elasticsearch Guide [8.14] | Elastic, it says --> " Aim for shards of up to 200M documents, or with sizes between 10GB and 50GB". So if I divide like 1TB/50GB ~ 20. Assuming I need to look for a better(on the lower side) pricing structure for my managed ES cluster, does this sizing calculation can be made better. Currently for this collection in mongo I believe retention is forever and we dont delete any data but we are also putting in a 1 year retention window so 1 TB of total data seems logical. MongoDB compresses data so I remember if the 1TB number is compressed or actual bytes.
Hi @Moni_Hazarika !
Multi-tenancy is something that comes quite often related to Elasticsearch. There are multiple strategies for implementing multi-tenancy, I'd recommend that you take a look at this post, specifically this comment, for taking a look at the tradeoffs between strategies.
Thanks @Carlos_D for your reply and sorry for the delay in response. I was evaluating going through the options discussed in the post and I feel for my usecase where we have like 300 odd tenants in a region having 1 cluster with 1 index per tenant seems fine. We are thinking to start with 1 50GB shard per index. Also we don't reach the 1000 shards per node limitation. Let me know your thoughts