Shard relocation keeps restarting?

TimWard · October 26, 2018, 3:00pm

Three nodes, 1,2,3, replacing one of them, so added 4, and told the system not to allocate anything to 3. What it's now trying to do is move two shards off 3, these are non-trivial in size (around 5Gbyte) and the link isn't fast (around 40Mbps).

What happens as observed from

GET _cat/recovery?v&h=i,s,t,ty,st,shost,thost,f,fp,b,bp&s=st:desc&active_only=true

is that these shards make progress up to around 20% - 30% and then appear to restart, with progress dropping back to zero.

Over and over again. For hours and hours and hours and hours.

There's nothing at all in the sending log sometimes when this happens, but from time to time (not always in sync with the relocations failing) it says things like

[2018-10-26T15:56:07,264][WARN ][o.e.t.n.Netty4Transport  ] [dev-monitor-3] send message failed [channel: NettyTcpChannel{localAddress=/172.31.11.57:9300, remoteAddress=/172.16.1.205:33542}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-10-26T15:56:07,428][INFO ][o.e.d.z.ZenDiscovery     ] [dev-monitor-3] master_left [{dev-monitor-1}{M6uY-xHjQS250KMdYB2fHA}{MXc_8PjgREGaMdzN8bjwwQ}{172.16.2.38}{172.16.2.38:9300}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2018-10-26T15:56:07,428][WARN ][o.e.d.z.ZenDiscovery     ] [dev-monitor-3] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: nodes:
   {dev-monitor-4}{heLU-gDkRXug-S3Cexx1vw}{aThXvdV3R7ORitXzw2BtSA}{172.16.2.64}{172.16.2.64:9300}{ml.machine_memory=33729298432, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {dev-monitor-3}{Bn33dAGxTseq4VPE48gr-A}{BZZS7yVFSHaKewRxkO_jyA}{172.31.11.57}{172.31.11.57:9300}, local
   {dev-monitor-2}{oRJuzLBKRumqstfiLicChw}{b2YB24yEQDCxb5LELMwf_A}{172.16.1.205}{172.16.1.205:9300}
   {dev-monitor-1}{M6uY-xHjQS250KMdYB2fHA}{MXc_8PjgREGaMdzN8bjwwQ}{172.16.2.38}{172.16.2.38:9300}, master

[2018-10-26T15:56:10,886][INFO ][o.e.c.s.ClusterApplierService] [dev-monitor-3] detected_master {dev-monitor-1}{M6uY-xHjQS250KMdYB2fHA}{MXc_8PjgREGaMdzN8bjwwQ}{172.16.2.38}{172.16.2.38:9300}, reason: apply cluster state (from master [master {dev-monitor-1}{M6uY-xHjQS250KMdYB2fHA}{MXc_8PjgREGaMdzN8bjwwQ}{172.16.2.38}{172.16.2.38:9300} committed version [5177850]])

DavidTurner · October 27, 2018, 1:58pm

This looks like a connectivity problem. dev-monitor-3 sent three consecutive pings to the master (dev-monitor-1) each of which received no response within 30 seconds. Also one of the channels between dev-monitor-3 and dev-monitor-2 was closed. I'd expect there to be messages in the master node's logs too, indicating that dev-monitor-3 temporarily left the cluster, which would cancel the ongoing recoveries.

It's possible the recovery is consuming the node's entire bandwidth and preventing higher-priority traffic like pings from getting through soon enough, particularly if there is a device with an excessively large buffer somewhere in the way. The default for indices.recovery.max_bytes_per_sec is 40MBps (megabytes per second) so if you only have a 40Mbps link (megabits per second) then this could explain it. Try reducing indices.recovery.max_bytes_per_sec to something compatible with your network (e.g. 4mb == 4 megabytes per second == 32 megabits per second) and see if this gives more stability.

It's possible it's something else too, but this'd be my first guess.

DavidTurner · October 27, 2018, 2:03pm

Also, I'm curious, 40Mbps is a pretty narrow pipe these days. What's the story there? Are your nodes connected by satellite, for instance?

TimWard · October 29, 2018, 9:10am

After the weekend it's still doing it. I'm going to accept the loss of those shards and close down the node I'm trying to get rid of.

I doubt it's bandwidth as I reduced the number of concurrent recoveries to one and reduced the bandwidth used by the recovery to well under the link's capacity. At which point the single recovery got about twice as far (in percentage terms) as when I was running two at once. So I'm suspecting it's a time limit of some sort, maybe a firewall dropping a TCP connection or something like that. I'm not intending to investigate further.

system · November 26, 2018, 9:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot relocate shard bcause of unstable network Elasticsearch	4	451	July 6, 2017
After adding a third node to the cluster, some shards won't relocate Elasticsearch	4	551	July 6, 2017
Is it me or is ES 1.6.0 node startup/recovery slower then before? Elasticsearch	15	1128	July 6, 2017
Shard reallocation stops Elasticsearch	11	4626	November 7, 2017
Index relocation failing at 100% bytes Elasticsearch	5	62	January 13, 2025

Shard relocation keeps restarting?

Related topics