Managing shard distribution

Dave_Rawks_2 · April 1, 2013, 5:03pm

Is there any way to configure elasticsearch such that all the given shards
and replicas for a give index will be evenly and equally distributed across
all nodes? I've got a 6 node cluster and all ym indices are configured for
6 shards and 1 replica; it would seem to me that they should distribute
such that each node has a single primary and a single replica shard for
each index. However in practice the distribution scheme seems to be
ignorant of indices resulting in most indices being spread across at most 2
or 3 of the nodes. My indices hold log data and the general query usage
hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed with
some index awareness it seems like I'll get better distribution of load on
queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · April 1, 2013, 10:12pm

Using 0.20.x or earlier, your only option is to set total_shards_per_node.
I have some custom code in my app that re-adjusts this based on data node
count and if things do get stuck in yellow like you mention, toggles this
number up 1 and then down for that specific index.

If you're on 0.90.x or later, ES has this built in:

github.com/elastic/elasticsearch

#2555 Added BalancedShardsAllocator that balances shards based on a weight function.

elastic:master ← s1monw:balance_shard_allocator

opened 10:47AM - 17 Jan 13 UTC

s1monw

+2294 -29

- Weights are calculated per index and incorporate index level, global and prima…ry related parameters - Balance operations are executed based on a win maximation strategy that tries to relocate shards first that offer the biggest gain towards the weight functions optimum - The WeightFunction allows settings to prefer index based balance over global balance and vice versa - Balance operations can be throttled by raising a threshold resulting in less agressive balance operations - WeightFunction shipps with defaults to achive evenly distributed indexes while maintaining a global balance This closes issue #2555

Best Regards,
Paul

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given shards
and replicas for a give index will be evenly and equally distributed across
all nodes? I've got a 6 node cluster and all ym indices are configured for
6 shards and 1 replica; it would seem to me that they should distribute
such that each node has a single primary and a single replica shard for
each index. However in practice the distribution scheme seems to be
ignorant of indices resulting in most indices being spread across at most 2
or 3 of the nodes. My indices hold log data and the general query usage
hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed with
some index awareness it seems like I'll get better distribution of load on
queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohammady_Mahdy · April 2, 2013, 7:35am

@ppearcy is this something you could share? do you have an idea of what
causes things to get stuck in red?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set total_shards_per_node.
I have some custom code in my app that re-adjusts this based on data node
count and if things do get stuck in yellow like you mention, toggles this
number up 1 and then down for that specific index.

If you're on 0.90.x or later, ES has this built in:
#2555 Added BalancedShardsAllocator that balances shards based on a weight function. by s1monw · Pull Request #2556 · elastic/elasticsearch · GitHub

Best Regards,
Paul

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohammady_Mahdy · April 2, 2013, 7:36am

Hi Paul,

Thanks for your response.

@ppearcy is this code something you could share? do you have an idea of
what causes things to get stuck in yellow?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set total_shards_per_node.
I have some custom code in my app that re-adjusts this based on data node
count and if things do get stuck in yellow like you mention, toggles this
number up 1 and then down for that specific index.

If you're on 0.90.x or later, ES has this built in:
#2555 Added BalancedShardsAllocator that balances shards based on a weight function. by s1monw · Pull Request #2556 · elastic/elasticsearch · GitHub

Best Regards,
Paul

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · April 2, 2013, 5:26pm

Sure... You should be able to drop this code into a background thread
process that runs every so often. Will need to make at least a couple of
tweaks:

You will not have an AMS class (that is for some internal monitoring
system we have)
You won't have an ESIndexer class that holds the elasticsearch client

Use at your own risk, no guarantees, yadayadayada

gist.github.com

https://gist.github.com/ppearcy/5294200

gistfile1.txt

  /**
	 * Iterate over the indexes and automatically set the index.routing.allocation.total_shards_per_node
	 * based on the total shards for the index and the number of data nodes that we have
	 */
	public void setTotalShardsPerNode() {
		ClusterHealthResponse health = ESIndexer.es.client.admin().cluster().health(new ClusterHealthRequest()).actionGet();

		// These values are used to decide what do do below
		int numDataNodes = health.getNumberOfDataNodes();
		int initShards = health.getInitializingShards();

This file has been truncated. show original

Best Regards,
Paul

On Tuesday, April 2, 2013 1:36:02 AM UTC-6, Mo wrote:

Hi Paul,

Thanks for your response.

@ppearcy is this code something you could share? do you have an idea of
what causes things to get stuck in yellow?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set
total_shards_per_node. I have some custom code in my app that re-adjusts
this based on data node count and if things do get stuck in yellow like you
mention, toggles this number up 1 and then down for that specific index.

If you're on 0.90.x or later, ES has this built in:
#2555 Added BalancedShardsAllocator that balances shards based on a weight function. by s1monw · Pull Request #2556 · elastic/elasticsearch · GitHub

Best Regards,
Paul

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jilles_van_Gurp · April 3, 2013, 7:05am

The upcoming 0.90 release has a vastly improved way of allocating shards.
So, if you are considering custom solutions, you might want to upgrade to
the release candidate and see if that solves your problem.

Jilles

On Monday, April 1, 2013 7:03:36 PM UTC+2, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given shards
and replicas for a give index will be evenly and equally distributed across
all nodes? I've got a 6 node cluster and all ym indices are configured for
6 shards and 1 replica; it would seem to me that they should distribute
such that each node has a single primary and a single replica shard for
each index. However in practice the distribution scheme seems to be
ignorant of indices resulting in most indices being spread across at most 2
or 3 of the nodes. My indices hold log data and the general query usage
hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed with
some index awareness it seems like I'll get better distribution of load on
queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohammady_Mahdy · April 3, 2013, 7:40am

many thanks

On Tuesday, April 2, 2013 9:26:56 PM UTC+4, ppearcy wrote:

Sure... You should be able to drop this code into a background thread
process that runs every so often. Will need to make at least a couple of
tweaks:

You will not have an AMS class (that is for some internal monitoring
system we have)

You won't have an ESIndexer class that holds the elasticsearch client

Use at your own risk, no guarantees, yadayadayada

Code to dynamically set the number of shards per node for each elasticsearch index. · GitHub

Best Regards,
Paul

On Tuesday, April 2, 2013 1:36:02 AM UTC-6, Mo wrote:

Hi Paul,

Thanks for your response.

@ppearcy is this code something you could share? do you have an idea of
what causes things to get stuck in yellow?

On Tuesday, April 2, 2013 2:12:57 AM UTC+4, ppearcy wrote:

Using 0.20.x or earlier, your only option is to set
total_shards_per_node. I have some custom code in my app that re-adjusts
this based on data node count and if things do get stuck in yellow like you
mention, toggles this number up 1 and then down for that specific index.

If you're on 0.90.x or later, ES has this built in:
#2555 Added BalancedShardsAllocator that balances shards based on a weight function. by s1monw · Pull Request #2556 · elastic/elasticsearch · GitHub

Best Regards,
Paul

On Monday, April 1, 2013 11:03:36 AM UTC-6, Dave Rawks wrote:

Is there any way to configure elasticsearch such that all the given
shards and replicas for a give index will be evenly and equally distributed
across all nodes? I've got a 6 node cluster and all ym indices are
configured for 6 shards and 1 replica; it would seem to me that they should
distribute such that each node has a single primary and a single replica
shard for each index. However in practice the distribution scheme seems to
be ignorant of indices resulting in most indices being spread across at
most 2 or 3 of the nodes. My indices hold log data and the general query
usage hits 1, 7, 30, or 90 indices. If the shards/replicas are distributed
with some index awareness it seems like I'll get better distribution of
load on queries regardless of the number of indices being queried.

Setting the max shards per node option sort of works, the shard routing
will sometimes make poor decisions resulting in a single shard unable to
find a suitable node AND there is no flexibility to the option allowing for
replication to ensure the extra replicas in the case that one node
disappears.

Any help would be muchly appreciated.

-Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Even Shard Distribution? Elasticsearch	7	2654	July 6, 2017
Balancing shards equally per nodes Elasticsearch	3	4771	July 5, 2017
Equality shard index distribution Elasticsearch	4	290	May 18, 2021
How to distribute Primary & Replica shards equally across the nodes Elasticsearch	3	344	April 27, 2023
Question regarding Shard Distribution while adding a replica to cluster Elasticsearch	1	339	July 6, 2017

Managing shard distribution

Related topics