Is there any way to configure elasticsearch such that all the given shards
and replicas for a give index will be evenly and equally distributed across
all nodes? I've got a 6 node cluster and all ym indices are configured for
6 shards and 1 replica; it would seem to me that they should distribute
such that each node has a single primary and a single replica shard for
each index. However in practice the distribution scheme seems to be
ignorant of indices resulting in most indices being spread across at most 2
or 3 of the nodes. My indices hold log data and the general query usage
hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed with
some index awareness it seems like I'll get better distribution of load on
queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Using 0.20.x or earlier, your only option is to set total_shards_per_node.
I have some custom code in my app that re-adjusts this based on data node
count and if things do get stuck in yellow like you mention, toggles this
number up 1 and then down for that specific index.

If you're on 0.90.x or later, ES has this built in:

Best Regards,
Paul

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given shards
and replicas for a give index will be evenly and equally distributed across
all nodes? I've got a 6 node cluster and all ym indices are configured for
6 shards and 1 replica; it would seem to me that they should distribute
such that each node has a single primary and a single replica shard for
each index. However in practice the distribution scheme seems to be
ignorant of indices resulting in most indices being spread across at most 2
or 3 of the nodes. My indices hold log data and the general query usage
hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed with
some index awareness it seems like I'll get better distribution of load on
queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

@ppearcy is this something you could share? do you have an idea of what
causes things to get stuck in red?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set total_shards_per_node.
I have some custom code in my app that re-adjusts this based on data node
count and if things do get stuck in yellow like you mention, toggles this
number up 1 and then down for that specific index.

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

@ppearcy is this code something you could share? do you have an idea of
what causes things to get stuck in yellow?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set total_shards_per_node.
I have some custom code in my app that re-adjusts this based on data node
count and if things do get stuck in yellow like you mention, toggles this
number up 1 and then down for that specific index.

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Sure... You should be able to drop this code into a background thread
process that runs every so often. Will need to make at least a couple of
tweaks:

You will not have an AMS class (that is for some internal monitoring
system we have)

You won't have an ESIndexer class that holds the elasticsearch client

Use at your own risk, no guarantees, yadayadayada

Best Regards,
Paul

On Tuesday, April 2, 2013 1:36:02 AM UTC-6, Mo wrote:

Hi Paul,

Thanks for your response.

@ppearcy is this code something you could share? do you have an idea of
what causes things to get stuck in yellow?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set
total_shards_per_node. I have some custom code in my app that re-adjusts
this based on data node count and if things do get stuck in yellow like you
mention, toggles this number up 1 and then down for that specific index.

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

The upcoming 0.90 release has a vastly improved way of allocating shards.
So, if you are considering custom solutions, you might want to upgrade to
the release candidate and see if that solves your problem.

Jilles

On Monday, April 1, 2013 7:03:36 PM UTC+2, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given shards
and replicas for a give index will be evenly and equally distributed across
all nodes? I've got a 6 node cluster and all ym indices are configured for
6 shards and 1 replica; it would seem to me that they should distribute
such that each node has a single primary and a single replica shard for
each index. However in practice the distribution scheme seems to be
ignorant of indices resulting in most indices being spread across at most 2
or 3 of the nodes. My indices hold log data and the general query usage
hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed with
some index awareness it seems like I'll get better distribution of load on
queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

On Tuesday, April 2, 2013 1:36:02 AM UTC-6, Mo wrote:

Hi Paul,

Thanks for your response.

@ppearcy is this code something you could share? do you have an idea of
what causes things to get stuck in yellow?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set
total_shards_per_node. I have some custom code in my app that re-adjusts
this based on data node count and if things do get stuck in yellow like you
mention, toggles this number up 1 and then down for that specific index.

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.