I have done a lot of reading on the "Definitive Elastic Guide", read many of the suggestions on the web and in this community... but yet, I still have a handful of questions and concerns about the Index and sharding approach because our scenario is significantly different.
- Mutli-tenant env where multiple customers store their docs a single Index
- Customers can very in size and usage (number of docs and read frequency) significantly
- Need ability to reindex customer to their own dedicated indexes or shard when they get "too big" for shared index
- Need to provide row-level security to provide tenant-level read access.
- Documents have common/identical fields but each tenant can define additional custom fields
- We will be using elasticsearch 7.x or higher. so no multi _doc type support
- This is a new product, and thus we have no idea what the work loads or data models will be. So, it's important to have a flexible architecture and code that can support easy re-architecture.
- Pricing for this product does not justify expensive x-pack or other solutions. we have to stay with free or very low cost options
- This will be hosted on AWS or Azure. No Elastic Cloud
This is our initial design idea. feel free to suggest, poke wholes, ask questions, points out any issues...
- _id = tenantID + documentID to assure uniqueness
- All the common/identical doc fields will be added to the index
- All customer-defined doc fields will be added to the index with a unique name to make sure they do not collide with each other.
- When customer is making a request then use RESTReadOnly plugin to provide dc-level access and use source filtering to only return common fields + custom fields
- If field count gets above 1000+ then move some customers with large number of custom fields to new index to reduce number of fields.
- If customer read and query load gets too big / or data gets too big then move to new index or new shard.
As stated above. this is our initial design for a new product and we are new to elastic. we don't know what we don't know and we don't have real life data to make any decisions. but here are some areas that we think we might run into issues and we would like your thoughts on them:
- Is this a viable multi-tenant/shared index design?
- Is RESTReadOnly a good choice? any other suggestions for controlling doc-level access?
- Can 1000 fields be supported in a single index without a outsized costs on RAM, CPU and Network
- Is source filtering a good option to limiting available fields in responses?
- Is our re-indexing and re-sharding a sound approach for tenants that get too big?
- Are there are tools out there to help us move/split indexes?
- What else are we missing?
Your help, experience, feedback, encouragement would be greatly appreciated.