Sizing and configuration for multi-tenant application


I've been using ES in production for a long time for our application, but it is currently single tenant, and so I have dedicated ES instances running for each client. We are moving towards a multi-tenant platform and I just want to make sure I'm approaching this well from a hardware/config perspective.

Currently, I'm storing a total of roughly 7TB of data across all these tenants. The plan is to move this all into a multi-tenant system. This could all be in one huge cluster, or we could separate customers per cluster - it's not too tricky either way. Looking for best recommendations here to support future growth as well.

For queries and indexing, we generate a routing key from each tenant to force their data to a specific shard. I'm currently planning on having three master nodes, and I'm having a hard time finding a good recommendation on hardware sizing here. Am I right in thinking that dedicated master nodes don't need a ton of resources compared to data nodes?

For the data nodes:

  • Based on 7TB of data (and growing) what would you recommend as a quantity of data nodes and hardware specs? I'm assuming many boxes, each with 64GB of RAM and a <32GB heap is optimal, but is there a point that it is more sensible to have multiple independent clusters rather than trying to scale one cluster out?
  • What are the critical statistics to monitor to know when we should add another data node?

For indexing:

  • We currently have one index per object type in our application (e.g. all users are in a users index, all payments are in a payments index, etc. Just want to make sure that's the best approach - in our legacy app, we have an entities index with each object being a separate type, but it appears that is not recommended anymore.
  • Based on this, I feel like we should have a relatively low shard count, as each index doesn't have a ton of data in it (we probably have 100+ indexes at this point.) That being said, with it being multi-tenant now, it will be multiplied by the number of customers. Thoughts?

Anything else I should bear in mind in this transition? Appreciate any feedback!

Bump.. hoping someone has some input :slight_smile:

How many tenants do you have? How fast do you expect this to grow? How many different types of indices? How many shards in total?

About 450 today, we're adding roughly 10-15 a month.

The way it is setup today, we have around 250.

Interested in feedback here as to the best approach. I haven't settled on a specific configuration.

Having that many index types for each tenant will not scale so is in my opinion not an option. If you do not have any mapping conflicts between the object types I would recommend storing them all in a single index and simply add a field that indicates the type that you can filter on. In that case you may be able to get away with an index per tenant as long as growth is not too quick.

If that is not an option I suspect you will need to continue with your current setup and handle filtering at the application layer in front of Elasticsearch.

Well, we could have one index with many types within in - that's what I'm doing in the current version of our application. However, it appears that the Elastic documentation recommends against doing this.

It also wouldn't be 250 indexes per tenant - I'm handling the tenant routing by using the routing key, so there'd only be 250 indexes in total. Does that change your thinking?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.