Node flapping between creating node and discarding it


(mrflip) #1

We're seeing a node in the ring flapping in recovery:

https://gist.github.com/752406

If you see, it's caught in a loop where it starts recovering shard
[tweet-201010][3] :

[01:03:32,962][DEBUG][indices.memory ] [Centurion]
recalculating shard indexing buffer (reason=created_shard[tweet-201010]
[3]), total is [512mb] with [3] shards, each shard set to [170.6mb]
[01:03:32,962][DEBUG][index.shard.service ] [Centurion]
[tweet-201010][3] state: [CREATED]->[RECOVERING]

and then very shortly discards it:

[01:03:32,968][DEBUG][index.shard.service ] [Centurion]
[tweet-201010][3] state: [RECOVERING]->[CLOSED]
[01:03:32,971][DEBUG][indices.memory ] [Centurion]
recalculating shard indexing buffer (reason=removed_shard[tweet-201010]
[3]), total is [512mb] with [3] shards, each shard set to [170.6mb]

This cycles as often as one or two times a minute.

The cluster shows as green, with one recovering shard:

{
"active_primary_shards": 32,
"number_of_nodes": 16,
"number_of_data_nodes": 16,
"unassigned_shards": 0,
"timed_out": false,
"active_shards": 64,
"initializing_shards": 0,
"relocating_shards": 1,
"status": "green",
"cluster_name": "hoolock"
}

flip


(mrflip) #2

Interesting: the flapping node had equilibrated with two empty shards
and two full ones:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
4120 total
2068 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2053 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/1/
index
1 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/14/
index
1 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/index

The cluster showed two recovering shards.

I shut that node down, and moved its data directory completely away.
When it started up, it recreated the directory structure and adopted
exactly the same set of nodes, getting a complete copy of 0 and 1 and
incomplete copies of 14 and 2, same as before.

Then I restarted the process in place -- so that the new process came
up before the rest of the cluster started missing the esnode. It
started by recovering shard 2:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
5125 total
2068 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2053 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/1/
index
1 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/14/
index
1005 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

while still in the process of doing that, it discarded shard 0 and 14
as being redundant:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
3638 total
2053 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/1/
index
1586 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index ### <-- still in recovery

by the time it had finished recovery of shard 2, it also discarded
shard 0:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
2100 total
2100 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

It then started recovering shard 0 (!):

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
2372 total
273 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2100 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

and then recovered shard 14 too:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
6205 total
2068 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2037 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/14/
index
2100 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

It's now happy and healthy, with no relocating shards and no zero-
length shards.

However, I think there's something causing elasticsearch to be too
aggressive about initiating recovery, and/or something causing cluster
balancing to "resonate".

flip

On Dec 22, 7:30 pm, mrflip mrf...@gmail.com wrote:

We're seeing a node in the ring flapping in recovery:

https://gist.github.com/752406

If you see, it's caught in a loop where it starts recovering shard
[tweet-201010][3] :

[01:03:32,962][DEBUG][indices.memory ] [Centurion]
recalculating shard indexing buffer (reason=created_shard[tweet-201010]
[3]), total is [512mb] with [3] shards, each shard set to [170.6mb]
[01:03:32,962][DEBUG][index.shard.service ] [Centurion]
[tweet-201010][3] state: [CREATED]->[RECOVERING]

and then very shortly discards it:

[01:03:32,968][DEBUG][index.shard.service ] [Centurion]
[tweet-201010][3] state: [RECOVERING]->[CLOSED]
[01:03:32,971][DEBUG][indices.memory ] [Centurion]
recalculating shard indexing buffer (reason=removed_shard[tweet-201010]
[3]), total is [512mb] with [3] shards, each shard set to [170.6mb]

This cycles as often as one or two times a minute.

The cluster shows as green, with one recovering shard:

{
"active_primary_shards": 32,
"number_of_nodes": 16,
"number_of_data_nodes": 16,
"unassigned_shards": 0,
"timed_out": false,
"active_shards": 64,
"initializing_shards": 0,
"relocating_shards": 1,
"status": "green",
"cluster_name": "hoolock"

}

flip


(mrflip) #3

I'll reply to my own issue and note that Kimchy fixed the problem in
0.14.0-pre -- commit 22fa91efa041ec5ec36e / Dec 23
It was an issue with non-master nodes connecting to each other,
possibly having to do with higher network latency on EC2

\m/
flip

On Dec 22, 8:25 pm, mrflip mrf...@gmail.com wrote:

Interesting: the flapping node had equilibrated with two empty shards
and two full ones:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
4120 total
2068 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2053 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/1/
index
1 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/14/
index
1 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/index

The cluster showed two recovering shards.

I shut that node down, and moved its data directory completely away.
When it started up, it recreated the directory structure and adopted
exactly the same set of nodes, getting a complete copy of 0 and 1 and
incomplete copies of 14 and 2, same as before.

Then I restarted the process in place -- so that the new process came
up before the rest of the cluster started missing the esnode. It
started by recovering shard 2:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
5125 total
2068 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2053 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/1/
index
1 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/14/
index
1005 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

while still in the process of doing that, it discarded shard 0 and 14
as being redundant:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
3638 total
2053 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/1/
index
1586 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index ### <-- still in recovery

by the time it had finished recovery of shard 2, it also discarded
shard 0:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
2100 total
2100 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

It then started recovering shard 0 (!):

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
2372 total
273 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2100 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

and then recovered shard 14 too:

hoolock-data_esnode-9 /mnt$ du -mc /mnt/elasticsearch/data/hoolock/
nodes/0/indices/10//index | sort -t/ -k9,10
6205 total
2068 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/0/
index
2037 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/14/
index
2100 /mnt/elasticsearch/data/hoolock/nodes/0/indices/tweet-201010/2/
index

It's now happy and healthy, with no relocating shards and no zero-
length shards.

However, I think there's something causing elasticsearch to be too
aggressive about initiating recovery, and/or something causing cluster
balancing to "resonate".

flip

On Dec 22, 7:30 pm, mrflip mrf...@gmail.com wrote:

We're seeing a node in the ring flapping in recovery:

https://gist.github.com/752406

If you see, it's caught in a loop where it starts recovering shard
[tweet-201010][3] :

[01:03:32,962][DEBUG][indices.memory ] [Centurion]
recalculating shard indexing buffer (reason=created_shard[tweet-201010]
[3]), total is [512mb] with [3] shards, each shard set to [170.6mb]
[01:03:32,962][DEBUG][index.shard.service ] [Centurion]
[tweet-201010][3] state: [CREATED]->[RECOVERING]

and then very shortly discards it:

[01:03:32,968][DEBUG][index.shard.service ] [Centurion]
[tweet-201010][3] state: [RECOVERING]->[CLOSED]
[01:03:32,971][DEBUG][indices.memory ] [Centurion]
recalculating shard indexing buffer (reason=removed_shard[tweet-201010]
[3]), total is [512mb] with [3] shards, each shard set to [170.6mb]

This cycles as often as one or two times a minute.

The cluster shows as green, with one recovering shard:

{
"active_primary_shards": 32,
"number_of_nodes": 16,
"number_of_data_nodes": 16,
"unassigned_shards": 0,
"timed_out": false,
"active_shards": 64,
"initializing_shards": 0,
"relocating_shards": 1,
"status": "green",
"cluster_name": "hoolock"

}

flip


(system) #4