Elasticsearch 6.3 (but our ES experience is all from earlier versions)
We have a typical multi tenant cloud app. 5,000 accounts on the cloud app. Each account has 500 to 1,000,000 records/documents.
When we use a single monolithic Elasticsearch index, it is ~800GB in size, and our queries are slow to run -- really slow. Like 5-14 seconds with the data on a hot SSD based server. (We are doing sorts on different columns, and of course only ever want data for a single accountid.)
We know that the data for a single account would give us super fast ES queries, but partitioning into 5,000 indexes seems over the top, and might require a rather large cluster so that no single server has to handle 1,000 indexes.
In my SQL thinking, I keep wanting to make the big monolithic index have a "compound index" aspect (like the accountid), since every query has an accountid prefix, but how best to do this in ES?
Do bucket aggregations solve for this? (On paper they appear to)
-- Will ES always keep a top level bucket in a single shard?
Is multi-shard + routing the solution?
-- And then run 100 or 250 shards across 5 or more servers
How do we find a path forward?