Fair distribution of shards per node per index


(Steff) #1

I believe the current shards-distribution algoritm in ES aims at
having number_of_shards_all_in_all/number_of_nodes shards running on
each node. We are using one index for each month, so that all data
that "belongs" to a certain month is indexed into a particular index
representing that month. 99.99% percent of all data that is index into
ES during a concrete month "belongs" to that same month, so 99.99% of
the indexing going on during a concrete day works only against one
single index. All the other indices we keep around for "history"
purposes so that you can query back in time in historic data - we are
querying our data very seldom.

Because most of the indexing that goes on during a day goes on against
one single index, and because almost all of the work we do against ES
is about indexing new documents, we would like that the shards of that
particular index is fairly distributed among nodes. That does not seem
to be true. Basically we would like fair distribution af
number_of_shards/number_of_nodes PER INDEX. Does ES support that? Or
maybe there are a way to manually ask ES to relocate shards (maybe
only supporting swapping indices to keep the
number_of_shards_all_in_all/number_of_nodes invariant)

Regards, Per Steffensen


(Shay Banon) #2

Heya,

Yea, this is an area where elasticsearch should have better behavior.
There isn't an option ot have "per index" distribution, or weighted
distribution. There isn't a good answer now, except maybe for the ability
to control where an index can be placed (which nodes).

On Mon, Oct 31, 2011 at 12:19 PM, Steff steff@designware.dk wrote:

I believe the current shards-distribution algoritm in ES aims at
having number_of_shards_all_in_all/number_of_nodes shards running on
each node. We are using one index for each month, so that all data
that "belongs" to a certain month is indexed into a particular index
representing that month. 99.99% percent of all data that is index into
ES during a concrete month "belongs" to that same month, so 99.99% of
the indexing going on during a concrete day works only against one
single index. All the other indices we keep around for "history"
purposes so that you can query back in time in historic data - we are
querying our data very seldom.

Because most of the indexing that goes on during a day goes on against
one single index, and because almost all of the work we do against ES
is about indexing new documents, we would like that the shards of that
particular index is fairly distributed among nodes. That does not seem
to be true. Basically we would like fair distribution af
number_of_shards/number_of_nodes PER INDEX. Does ES support that? Or
maybe there are a way to manually ask ES to relocate shards (maybe
only supporting swapping indices to keep the
number_of_shards_all_in_all/number_of_nodes invariant)

Regards, Per Steffensen


(Mohit Anchlia) #3

On Mon, Oct 31, 2011 at 10:58 AM, Shay Banon kimchy@gmail.com wrote:

Heya,
Yea, this is an area where elasticsearch should have better behavior.
There isn't an option ot have "per index" distribution, or weighted
distribution. There isn't a good answer now, except maybe for the ability to
control where an index can be placed (which nodes).

I thought sharding is done based on the document id hash in a given
index. If that is true then creating new documents
would shard the data?

On Mon, Oct 31, 2011 at 12:19 PM, Steff steff@designware.dk wrote:

I believe the current shards-distribution algoritm in ES aims at
having number_of_shards_all_in_all/number_of_nodes shards running on
each node. We are using one index for each month, so that all data
that "belongs" to a certain month is indexed into a particular index
representing that month. 99.99% percent of all data that is index into
ES during a concrete month "belongs" to that same month, so 99.99% of
the indexing going on during a concrete day works only against one
single index. All the other indices we keep around for "history"
purposes so that you can query back in time in historic data - we are
querying our data very seldom.

Because most of the indexing that goes on during a day goes on against
one single index, and because almost all of the work we do against ES
is about indexing new documents, we would like that the shards of that
particular index is fairly distributed among nodes. That does not seem
to be true. Basically we would like fair distribution af
number_of_shards/number_of_nodes PER INDEX. Does ES support that? Or
maybe there are a way to manually ask ES to relocate shards (maybe
only supporting swapping indices to keep the
number_of_shards_all_in_all/number_of_nodes invariant)

Regards, Per Steffensen


(Steff) #4

On 31 Okt., 18:58, Shay Banon kim...@gmail.com wrote:

Heya,

Yea, this is an area where elasticsearch should have better behavior.
There isn't an option ot have "per index" distribution, or weighted
distribution. There isn't a good answer now, except maybe for the ability
to control where an index can be placed (which nodes).

Ok, thanks. Well it is open source, so maybe we will make something
ourselves. I didnt know about the abilitiy to control where an index
can be placed - can you give short explanation about how to control,
or maybe a pointer to a web-page explaining?

Thanks


(Steff) #5

I thought sharding is done based on the document id hash in a given
index. If that is true then creating new documents
would shard the data?

Yes documents are distributed among shards based on document id (by
default at least), but my problem is not that documents are not
distributed well among shards. The problem is that shards are not
distributed well among nodes.

Regards, Per Steffensen


(Shay Banon) #6

On Mon, Oct 31, 2011 at 9:45 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Mon, Oct 31, 2011 at 10:58 AM, Shay Banon kimchy@gmail.com wrote:

Heya,
Yea, this is an area where elasticsearch should have better behavior.
There isn't an option ot have "per index" distribution, or weighted
distribution. There isn't a good answer now, except maybe for the
ability to
control where an index can be placed (which nodes).

I thought sharding is done based on the document id hash in a given
index. If that is true then creating new documents
would shard the data?

Yes, thats what happens. But, then there is a question of how shards are
allocated among nodes. The way elasticsearch does that now is by aiming to
get an even number of shards per node. Thats not always
the desirable distribution (i.e. size based one, for example).

On Mon, Oct 31, 2011 at 12:19 PM, Steff steff@designware.dk wrote:

I believe the current shards-distribution algoritm in ES aims at
having number_of_shards_all_in_all/number_of_nodes shards running on
each node. We are using one index for each month, so that all data
that "belongs" to a certain month is indexed into a particular index
representing that month. 99.99% percent of all data that is index into
ES during a concrete month "belongs" to that same month, so 99.99% of
the indexing going on during a concrete day works only against one
single index. All the other indices we keep around for "history"
purposes so that you can query back in time in historic data - we are
querying our data very seldom.

Because most of the indexing that goes on during a day goes on against
one single index, and because almost all of the work we do against ES
is about indexing new documents, we would like that the shards of that
particular index is fairly distributed among nodes. That does not seem
to be true. Basically we would like fair distribution af
number_of_shards/number_of_nodes PER INDEX. Does ES support that? Or
maybe there are a way to manually ask ES to relocate shards (maybe
only supporting swapping indices to keep the
number_of_shards_all_in_all/number_of_nodes invariant)

Regards, Per Steffensen


(Shay Banon) #7

On Tue, Nov 1, 2011 at 9:30 AM, Steff steff@designware.dk wrote:

On 31 Okt., 18:58, Shay Banon kim...@gmail.com wrote:

Heya,

Yea, this is an area where elasticsearch should have better behavior.
There isn't an option ot have "per index" distribution, or weighted
distribution. There isn't a good answer now, except maybe for the ability
to control where an index can be placed (which nodes).

Ok, thanks. Well it is open source, so maybe we will make something
ourselves. I didnt know about the abilitiy to control where an index
can be placed - can you give short explanation about how to control,
or maybe a pointer to a web-page explaining?

Sure (note, its a new feature in 0.18):
http://www.elasticsearch.org/guide/reference/modules/cluster.html (see
Shard Allocation Filtering).

Thanks


(system) #8