Thanks for advance for reading and any answers! We're considering using ES with an index-per-user design, and had some specific questions about how it scales. I've read up on the subject, and a common suggestion for this scenario is using a shared index with a user-id filter; however I think we have a strong reason to not use that design (reason below, feedback welcome).
Why we prefer index-per-user over shared index:
We often need to fetch all of the documents for a user. If we're using a shared index, we can filter by user-id, but it still means making N disk reads since that user's documents are spread across a large index (a 10k document user would take over 1 second of IO time). If we use "index-per-user", the entire index is only 2-3MB for 10k documents (our documents are small, primarily non-analyzed fields). We can load the entire index in a few disk reads (AWS/EBS block size is 256kb). With index-per-user, we can load about 500 users per second with the same disk speed, which is ~833 times faster than when we use a shared index (actual performance varies, but its much much faster in an index per user design). Caching at the page cache level is also greatly improved (since many documents from a user fit into a single page).
- over 1M users. No requirement to search across all of them.
- only a small number of users are in use at a given time (~2000). Most are cold and could be closed using the ES close API (we're happy to manage open/close calls ourselves)
- 500-500k documents per user (99% under 25k). Documents are quite small and follow a consistent schema/mapping. As a result, 99% of indexes would be under 5MB.
- we also want to do some aggregation, which will be fast on the smaller indexes, but slows down on the larger shared indexes (still benchmarking)
- How much memory does each index take in the cluster state. Does it grow linearly?
- If an index is closed, how much memory does each index take in the cluster state?
- Is it crazy to attempt a cluster with >1M indexes (mostly closed) and ~5k open indexes? If so, please explain why.
- Any gotchas we should be aware of if considering launching "index-per-user" at this scale?
We'll also be benchmarking the above, but I thought I'd ask and get some feedback in case we're missing something.
Thanks in advance,