Unassigned Shards

We currently have roughly 4000 indices with 3 shards each. Eeach index
will contain approx 100k docs.

I have been testing with the setting for
node_initial_primaries_recoveries. Checking the health then, only
makes a difference with initializing nodes:
{
"active_primary_shards": 404,
"active_shards": 404,
"cluster_name": "engagor",
"initializing_shards": 300,
"number_of_data_nodes": 3,
"number_of_nodes": 3,
"relocating_shards": 0,
"status": "red",
"timed_out": false,
"unassigned_shards": 12766
}

Shards still become active very slowly (roughly once every 5 to 7
seconds accross the three nodes)

In the logs there seems to be an issue with starting shards. (bouncing
back and forth between master / node)

See master logs for 1 specific index/shard:
[2011-06-04 16:42:20,289][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:25,044][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:29,966][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:34,770][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]

For the same index/shard on one of the nodes:
[2011-06-04 16:42:13,064][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:17,897][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:22,637][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:27,574][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:32,378][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]

On Jun 4, 4:07 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, how many indices do you have? For such small indices, make sure to just allocate one shard per index (you have a lot of shards).

On Saturday, June 4, 2011 at 5:05 PM, Shay Banon wrote:

By default, it will throttle 4 concurrent primary allocation per node (which is the important one you wan to get to as fast as possible). You can set: cluster.routing.allocation.node_initial_primaries_recoveries to a higher value and it will cause more shards to be allocated concurrently.

This throttling is done so a machine will not be overloaded, it might make sense in your case to have a higher value.

On Saturday, June 4, 2011 at 4:11 PM, Engagor wrote:

Throtteling seems to be the issue I'm having. See the following Gist
for debug logs from the master:gist:1007894 · GitHub

The logs get spammed very fast with these kinds of entries.

Is there a setting I should change here?

Thanks in advance
Folke