We currently have roughly 4000 indices with 3 shards each. Eeach index
will contain approx 100k docs.
I have been testing with the setting for
node_initial_primaries_recoveries. Checking the health then, only
makes a difference with initializing nodes:
{
"active_primary_shards": 404,
"active_shards": 404,
"cluster_name": "engagor",
"initializing_shards": 300,
"number_of_data_nodes": 3,
"number_of_nodes": 3,
"relocating_shards": 0,
"status": "red",
"timed_out": false,
"unassigned_shards": 12766
}
Shards still become active very slowly (roughly once every 5 to 7
seconds accross the three nodes)
In the logs there seems to be an issue with starting shards. (bouncing
back and forth between master / node)
See master logs for 1 specific index/shard:
[2011-06-04 16:42:20,289][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:25,044][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:29,966][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:34,770][DEBUG][cluster.action.shard ] [Washout]
received shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
For the same index/shard on one of the nodes:
[2011-06-04 16:42:13,064][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:17,897][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:22,637][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:27,574][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
[2011-06-04 16:42:32,378][DEBUG][cluster.action.shard ] [Vishanti]
sending shard started for [shard3309-2011-3][0],
node[xIB_TnLmRX27a2m7U4n9kA], [P], s[INITIALIZING], reason [master
[Washout][xR0LOX4eS2WUnsoUFmfssg][inet[/10.10.10.2:9300]] marked shard
as initializing, but shard already started, mark shard as started]
On Jun 4, 4:07 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Also, how many indices do you have? For such small indices, make sure to just allocate one shard per index (you have a lot of shards).
On Saturday, June 4, 2011 at 5:05 PM, Shay Banon wrote:
By default, it will throttle 4 concurrent primary allocation per node (which is the important one you wan to get to as fast as possible). You can set: cluster.routing.allocation.node_initial_primaries_recoveries to a higher value and it will cause more shards to be allocated concurrently.
This throttling is done so a machine will not be overloaded, it might make sense in your case to have a higher value.
On Saturday, June 4, 2011 at 4:11 PM, Engagor wrote:
Throtteling seems to be the issue I'm having. See the following Gist
for debug logs from the master:gist:1007894 · GitHub
The logs get spammed very fast with these kinds of entries.
Is there a setting I should change here?
Thanks in advance
Folke