We're running a SAAS platform that utilizes Elasticsearch in a growing capacity. We're looking into improving our scaling ability and several approaches to help mitigate the noisy neighbor problem in shards.
We really like the approach outlined here of per-tenant index aliases with filtering and routing (and the ability this provides for us to split large users off to their own index): https://www.elastic.co/blog/found-multi-tenancy
However, I'm a bit concerned about having 150,000 index aliases and I'd like to consider future growth up to 250,000 tenants. This post discourages having "hundreds of thousands" of aliases, so I'm not sure this is the best approach now: Is there limit for alias indexes?
Is this many index aliases okay or are we getting into multiple-clusters territory?
Hundreds or thousands of aliases are fine. Hundreds of thousands, millions, etc are not ok
BTW aliases is only one of quite a few architecture considerations. Not sure how you intend to slice the clusters, but Cross Cluster Search might be something you could use as well. I'd generally say that humongous clusters are a bit of a problem because of the blast radius; smaller clusters might be easier to manage.
At the moment we have all tenants in a single index* and are doing the filtering manually when we build the queries. I'm considering the aliases as a way to:
a) Enforce our per-tenant filtering
b) Switch to using the tenant id as routing, yet allow us to split some of our biggest consumers off onto their own index to allow them to be across multiple shards.
Our cluster is performing within our expectations at the moment, but we are well aware of the single point of failure we have by being on a single cluster like this.
What other architecture approaches should we be considering?
*We have many indices, as we're indexing many unrelated things for our tenants. Ex People, Events, Media, etc. Each object type has it's own index, but all tenants share the single index for that document type.
What kinds of things can I test for when experimenting to find a feasible amount of aliases per cluster? Is it only the heap memory or are there other overheads with aliases as well?
This will be part of the cluster state. 7.0 is changing the game there, but I would generally make sure that your cluster stays responsive to dropped nodes, index creations, master elections,...
In general I think you will need to figure out if that is working for you and your scenario. We consider that kind of setup as an outlier and IMO don't actively test that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.