Designing a massive multi-tenant elasticsearch architecture

I have done a lot of reading on the "Definitive Elastic Guide", read many of the suggestions on the web and in this community... but yet, I still have a handful of questions and concerns about the Index and sharding approach because our scenario is significantly different.

The Scenario

  1. Mutli-tenant env where multiple customers store their docs a single Index
  2. Customers can very in size and usage (number of docs and read frequency) significantly
  3. Need ability to reindex customer to their own dedicated indexes or shard when they get "too big" for shared index
  4. Need to provide row-level security to provide tenant-level read access.
  5. Documents have common/identical fields but each tenant can define additional custom fields
  6. We will be using elasticsearch 7.x or higher. so no multi _doc type support
  7. This is a new product, and thus we have no idea what the work loads or data models will be. So, it's important to have a flexible architecture and code that can support easy re-architecture.
  8. Pricing for this product does not justify expensive x-pack or other solutions. we have to stay with free or very low cost options
  9. This will be hosted on AWS or Azure. No Elastic Cloud

The Design
This is our initial design idea. feel free to suggest, poke wholes, ask questions, points out any issues...

  1. _id = tenantID + documentID to assure uniqueness
  2. All the common/identical doc fields will be added to the index
  3. All customer-defined doc fields will be added to the index with a unique name to make sure they do not collide with each other.
  4. When customer is making a request then use RESTReadOnly plugin to provide dc-level access and use source filtering to only return common fields + custom fields
  5. If field count gets above 1000+ then move some customers with large number of custom fields to new index to reduce number of fields.
  6. If customer read and query load gets too big / or data gets too big then move to new index or new shard.

Potential Problems
As stated above. this is our initial design for a new product and we are new to elastic. we don't know what we don't know and we don't have real life data to make any decisions. but here are some areas that we think we might run into issues and we would like your thoughts on them:

  1. Is this a viable multi-tenant/shared index design?
  2. Is RESTReadOnly a good choice? any other suggestions for controlling doc-level access?
  3. Can 1000 fields be supported in a single index without a outsized costs on RAM, CPU and Network
  4. Is source filtering a good option to limiting available fields in responses?
  5. Is our re-indexing and re-sharding a sound approach for tenants that get too big?
  6. Are there are tools out there to help us move/split indexes?
  7. What else are we missing?

Your help, experience, feedback, encouragement would be greatly appreciated.

My first question would be; how important is this project to the business?

very important

Ok, well you can take this with a grain of salt, but you'd probably be best off investing in getting someone in - as in paying for their time - to spend time understanding this in detail and giving you advice and direction. We have services that can assist, but there are other people out there too.

I say that because while we can provide answers here to the best of our ability, it is not what I would bet the future of a business critical project on. And yes, that applies to the advice that I give as well :stuck_out_tongue:

Onto your questions;

  1. Depends on what sort of data this is, is it time based or something else?
  2. We'd suggesting using our Security functionality with field and document security. It's an ok choice if it works for you though
  3. What is "outsized"?
  4. It's an option, good is questionable unless you are doing query validation to stop someone trying to get around things by directly querying your cluster
  5. Yes it's a sane approach. Just put some kind of monitoring around things so you can spot large tenants before they get large
  6. Use the _reindex and/or _split APIs. There may be wrappers around those, but I haven't seen any yet. Alternatively look at seeing if ILM will work for you
  7. See my earlier points. I would strongly suggest that it'd be worth you using Elastic Cloud to save yourself having to manage the underlying instances and giving you access to field and document level security, machine learning and alerting to automatically track tenant growth and then let you know if there's anomalies, automated backups, latest versions and heaps more