We've been using elasticsearch on AWS for our application for two purposes:
as a search engine for user-created documents, and as a cache for activity
feeds in our application. We made a decision early-on to treat every
customer's content as a distinct index, for full logical separation of
customer data. We have about three hundred indexes in our cluster, with the
default 5-shards/1-replica setup.
Recently, we've had major problems with the cluster "locking up" to
requests and losing track of its nodes. We initially responded by
attempting to remove possible CPU and memory limits, and placed all nodes
in the same AWS placement group, to maximize inter-node bandwidth, all to
no avail. We eventually lost an entire production cluster, resulting in a
decision to split the indexes across two completely independent clusters,
each cluster taking half of the indexes, with application-level logic
determining where the indexes were.
All that is to say: with our setup, are we running into an undocumented
practical limit on the number of indexes or shards in a cluster? It ends
up being around 3000 shards with our setup. Our logs show evidence of nodes
timing out their responses to massive shard status-checks, and it gets
worse the more nodes there are in the cluster. It's also stable with only
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f5a8705-620f-4a41-8648-632c675d0291%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.