If you consider tens of thousands of indices on tens of thousands of nodes,
and the master node is the only node that can write to the cluster state,
it will have lot of work to do to keep up with all cluster state updates.
When the rate of changes to the cluster state increases, the master node
will be challenged to propagate the state changes to all other nodes
reliably and fast enough. There is no "distributed tree" yet, for example,
there are no special "forwarder nodes" that communicate the cluster state
to a partitioned set of nodes.
See Scalable cluster state propagation (zen discovery) · Issue #6186 · elastic/elasticsearch · GitHub
The cluster state is a big compressed JSON structure which also must fit
into the heap memory of master eligible nodes. ES also uses privileged
network traffic channels for cluster state communication to take precedence
over ordinary index/search messaging. But all these precautions may not be
enough at some point. You can observe the point by retrieving a growing
cluster state over the cluster state API and measure the size and time of
this request.
On the other hand, you can have calm cluster state and many thousand
indices when the type mappings are constant, no field updates occur, and no
nodes connect/disconnect. It all depends on the situation how you have to
operate the cluster. One important thing is to allocate enough resources to
master eligible data-less nodes so they are not hindered by extra
search/index load.
N.B. 20 nodes is not a big cluster. There are ES clusters of hundreds and
thousands of nodes. From my understanding, the communication of the master
and 20 nodes is not a serious problem. This becomes an issue at ~500-1000
nodes.
Jörg
On Sat, Sep 27, 2014 at 1:12 AM, Todd Nine tnine@apigee.com wrote:
Hi Jorg,
We're storing each application in it's own Index so we can manage it
independently of others. There's not set load or usage on our
applications. Some will be very small, a few hundred documents. Others
will be quite large, in the billions. We have no way of knowing what the
usage profile is. Rather, we're initially thinking that expansion will
occur using a combination of additional indexes and aliases referencing
those indexes. This allows us to automate the management of the aliases
and indexes, and in turn allows us to scale them to the needs of the
application without over allocating unused capacity. For instance, with
write heavy applications we can allocate more shards (via alias), for read
heavy application we can allocate more replicas.
We're not running our cluster on a single node. Our cluster is small to
begin with, it's 6 nodes in our current POC. Ultimately I expect us to
grow each cluster we stand up to 20 or so nodes. We'll expand as necessary
to support the number of shards and replicas and keep our performance up.
I'm not particularly worried about our ability to scale horizontally with
our hardware.
Rather, I'm concerned with how far can we scale on our number of indexes,
and how does that relate to the number of machines? When we keep adding
hardware, does this increase the upper bounds of the number of indexes we
can have? Not the physical shards and replicas, but the routing
information for the master of the shards and location of the replicas.
I've done distributed data storage for many years, and none of the
documentation on ES makes it clear if this becomes an issue operationally.
I'm leery to just assume it will "just work". When implementing something
like this, you either have to do a distributed tree for your meta data to
get the partitioning you need to scale infinitely, or every node must store
every shard's master information. How does it work in ES?
Thanks,
Todd
On Friday, September 26, 2014 4:29:53 PM UTC-6, Jörg Prante wrote:
Why do you want to create huge number of indexes on just a single node?
There are smarter methods to scale. Use over-allocation of shards. This
is explained by kimchy in this thread
http://elasticsearch-users.115913.n3.nabble.com/Over-
allocation-of-shards-td3673978.html
http://www.google.com/url?q=http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2FOver-allocation-of-shards-td3673978.html&sa=D&sntz=1&usg=AFQjCNEk7KTtpuEot3JtBBmMRMpH25vLDA
TL;DR you can create many thousands of aliases on a single (or few)
indices with just a few shards. There is no limit defined by ES, when your
configuration / hardware capacity is exceeded, you will see the node
getting sluggish.
Jörg
On Fri, Sep 26, 2014 at 11:23 PM, Todd Nine tn...@apigee.com wrote:
Hey guys. We’re building a Multi tenant application, where users create
applications within our single server. For our current ES scheme, we're
building an index per application. Are there any stress tests or
documentation on the upper bounds of the number of indexes a cluster can
handle? From my current understanding of meta data and routing, ever node
caches the meta data of all the indexes and shards for routing. At some
point, this will obviously overwhelm the node. Is my current understanding
correct, or is this information partitioned across the cluster as well?
Thanks,
Todd
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/200e1ce7-c56f-49d4-9c02-4b1dcc570bf2%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/200e1ce7-c56f-49d4-9c02-4b1dcc570bf2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/595db4ab-b2f3-4dfe-bbf0-e4c13926e75e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/595db4ab-b2f3-4dfe-bbf0-e4c13926e75e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGULM1L_6s%3Dk7SAftA%3D0TM3Jx1haYddnAOLAuHa-4z%2BOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.