Unfollowing indices when leader is gone

javierE · March 2, 2020, 2:36pm

Hello,

I am currently testing ccr, and staging a particular scenario where my primary site is lost.

I need to script a procedure to change all my followers to R/W and resume the operations on the secondary site.
To achieve this, i loop over all followers doing pause_follower/close/unfollow/open

During this test I found two different behaviours when unfollowing indices when the leader is not available:

For some indices the unfollow call never returns (i waited for more than an hour), so i implement a timeout for this case.
The cluster shows the unfollow task running but never dies.

 curl --max-time 120  -X POST secondary-cluster:9200/indice-2019.01/_ccr/unfollow?pretty 
 curl: (28) Operation timed out after 120001 milliseconds with 0 out of -1 bytes received

and _cat/tasks :

action                         task_id                      parent_task_id               type      start_time    timestamp running_time ip            node
indices:admin/xpack/ccr/unfollow HNTVVJ9-Qemfv9Vq4sImng:3976    -                              transport 1582919983003 19:59:43 2.6d        192.168.1.YYY node1b

The unfollow call returns immediately with an error

 curl --max-time 120  -X POST "secondary-cluster:9200/indice-2019.01d/_ccr/unfollow?pretty" 
 {
   "error" : {
     "root_cause" : [
       {
         "type" : "connect_transport_exception",
         "reason" : "[][192.168.1.XXX:9300] connect_exception"
       }
     ],
     "type" : "exception",
     "reason" : "ConnectTransportException[[][192.168.1.XXX:9300] connect_exception]; nested: AnnotatedConnectException[Connection refused: /192.168.1.XXX:9300]; nested: ConnectException[Connection refused];",
     "failed_to_remove_retention_leases" : "secondary-cluster/indice-2019.01d/WmQ9QlAgRtS8iqE1tXIY5Q-following-remote-prod/indice-2019.01d/X_dOSvl_Taadxq-yutWuaQ",
     "caused_by" : {
       "type" : "connect_transport_exception",
       "reason" : "[][192.168.1.XXX:9300] connect_exception",
       "caused_by" : {
         "type" : "annotated_connect_exception",
         "reason" : "Connection refused: /192.168.1.XXX:9300",
         "caused_by" : {
           "type" : "connect_exception",
           "reason" : "Connection refused"
         }
       }
     }
   },
   "status" : 500
 }

In both cases _ccr/stats shows no followers and if I
open the index , everything seems ok and ready to R/W operations.

Should I be concerned about this long running tasks? Is this the expected behaviour when unfollowing if the leader is not available?

ES Version: 7.4.2 (tgz distribution)

Thanks for your help.
Regards.

DavidTurner · March 2, 2020, 4:01pm

Hmm. I can imagine things for which unfollowing might be waiting (in vain) but I think you're right and it shouldn't be. Any chance you can try and reproduce this on 7.6, just in case it's something that's been fixed? If it persists in the latest version, would you open an issue on Github about this?

javierE · March 2, 2020, 8:36pm

Thanks for your reply,

I reproduced this behaviour in version 7.6.0 , also realized that the unfollow task hangs only with indices with more than one primary shard. With 1 primary it returns de exception

I will run a few more tests and open a ticket.

javierE · March 9, 2020, 12:29pm

For reference:

Open ticket: https://github.com/elastic/elasticsearch/issues/53174
Closed in: https://github.com/elastic/elasticsearch/pull/53262

DavidTurner · March 9, 2020, 1:10pm

Possibly one of the best-written bug reports I've ever read, thanks @javierE

system · April 6, 2020, 1:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CCR: deleted follower index without unfollow Elasticsearch ccr-cross-cluster-replication	3	863	September 17, 2020
"no shard follow tasks" when pausing following on follower index Elasticsearch ccr-cross-cluster-replication	5	1219	December 9, 2019
CCR procedure question Elasticsearch ccr-cross-cluster-replication	1	336	May 11, 2022
Unable to make follower index as regular index in elasticsearch CCR setup Elasticsearch	1	425	September 3, 2020
Question on CCR feature on Elasticsearch 6.7 Elasticsearch ccr-cross-cluster-replication	14	1356	April 24, 2019

Unfollowing indices when leader is gone

Related topics