Move all index shards to a node


(Jorge Urdaneta) #1

Hi,

We're trying to improve the performance of a Bulk indexing in a
cluster. The problem here is that dynamic update mapping slows the
operations when the index grows.

Setting refresh_interval to -1 doesn't work as still dynamic update
mapping goes to other cluster nodes.

Setting number_of_replicas to 0 doesn't work as still shards get
distributed accross nodes so dynamic update mapping get slow.

So we wanted to move all shards to one node as we saw that indexing
those data in a one-node setting is very fast.

We saw here:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
A setting for creating and index that only goes to a specific node:

curl -XPUT localhost:9200/test -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'

Then it says:

"The provided settings can also be updated in real time using the
update settings API, allowing to “move” indices (shards) around in
realtime."

So we try with:

curl -XPUT localhost:9200/TheIndexName/_settings -d '{
"index" {
"number_of_replicas" : 0,
"routing.allocation.include.name" : "TheNodeName"
}
}'

We got no replicas (as we requested) but still shards are distributed
ignoring the setting for routing.allocation.include.name.

We need to do this for only one index. We noticed cluster api also
allows decomision of specific nodes. But we need other nodes to
continue working.

Any idea here? a possible bug?

using ElasticSearch 0.18.4 (that feature was introduced in 0.18.0 so
it should work https://github.com/elasticsearch/elasticsearch/issues/1311)


(Ivan Brusic) #2

I am also having issues with bulk indexing to a cluster. The system
begins to crawl as the index gets larger. And just like you, I set the
number of replicas to zero and disable the refresh interval. My
project is still under exploration, so I am only using two nodes.

My question is how were you able to determine that dynamic update
mappings are the cause of your problems? I have not paid attention to
the network chatter between the two boxes, but I am wondering if I
should.

Cheers,

Ivan

On Wed, Feb 15, 2012 at 2:20 PM, Jorge Urdaneta
jorge.urdaneta@gmail.com wrote:

Hi,

We're trying to improve the performance of a Bulk indexing in a
cluster. The problem here is that dynamic update mapping slows the
operations when the index grows.

Setting refresh_interval to -1 doesn't work as still dynamic update
mapping goes to other cluster nodes.

Setting number_of_replicas to 0 doesn't work as still shards get
distributed accross nodes so dynamic update mapping get slow.

So we wanted to move all shards to one node as we saw that indexing
those data in a one-node setting is very fast.

We saw here:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
A setting for creating and index that only goes to a specific node:

curl -XPUT localhost:9200/test -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'

Then it says:

"The provided settings can also be updated in real time using the
update settings API, allowing to “move” indices (shards) around in
realtime."

So we try with:

curl -XPUT localhost:9200/TheIndexName/_settings -d '{
"index" {
"number_of_replicas" : 0,
"routing.allocation.include.name" : "TheNodeName"
}
}'

We got no replicas (as we requested) but still shards are distributed
ignoring the setting for routing.allocation.include.name.

We need to do this for only one index. We noticed cluster api also
allows decomision of specific nodes. But we need other nodes to
continue working.

Any idea here? a possible bug?

using ElasticSearch 0.18.4 (that feature was introduced in 0.18.0 so
it should work https://github.com/elasticsearch/elasticsearch/issues/1311)


(Shay Banon) #3

First, regarding the allocation, are you trying to filter based on the randomized/explicitly set node name? If so, you should use _name as the attribute value, not name.

Regarding bulk being slow, how did you came up with the fact that updating the mapping slows it down? Its an async process (updating the mapping). Do you have a case where each new bulk item has new fields?

On Thursday, February 16, 2012 at 12:20 AM, Jorge Urdaneta wrote:

Hi,

We're trying to improve the performance of a Bulk indexing in a
cluster. The problem here is that dynamic update mapping slows the
operations when the index grows.

Setting refresh_interval to -1 doesn't work as still dynamic update
mapping goes to other cluster nodes.

Setting number_of_replicas to 0 doesn't work as still shards get
distributed accross nodes so dynamic update mapping get slow.

So we wanted to move all shards to one node as we saw that indexing
those data in a one-node setting is very fast.

We saw here:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
A setting for creating and index that only goes to a specific node:

curl -XPUT localhost:9200/test -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'

Then it says:

"The provided settings can also be updated in real time using the
update settings API, allowing to “move” indices (shards) around in
realtime."

So we try with:

curl -XPUT localhost:9200/TheIndexName/_settings -d '{
"index" {
"number_of_replicas" : 0,
"routing.allocation.include.name (http://routing.allocation.include.name)" : "TheNodeName"
}
}'

We got no replicas (as we requested) but still shards are distributed
ignoring the setting for routing.allocation.include.name (http://routing.allocation.include.name).

We need to do this for only one index. We noticed cluster api also
allows decomision of specific nodes. But we need other nodes to
continue working.

Any idea here? a possible bug?

using ElasticSearch 0.18.4 (that feature was introduced in 0.18.0 so
it should work https://github.com/elasticsearch/elasticsearch/issues/1311)


(system) #4