Elasticsearch Cross Cluster Replication connect_timeout

Matt_G · April 13, 2020, 4:11pm

We are running 2 instances of elasticsearch running version 6.8.1. The first instance in our production environment and the second one is in our development environment. We're attempting to replicate the data from the production site to our development site so we can use it in testing for an application that will be hitting elasticsearch.

We've opened port 9300 between these two environments and I can see in the Kibana GUI that the production cluster is connected under Remote Clusters. However when I create a follower index in order to test the replication from production to development the shards on the development cluster are failing to allocate, stating connect_timeout

[2020-04-13T15:56:47,245][WARN ][o.e.c.r.a.AllocationService] [devMaster] failing shard [failed shard, shard [builds-20200410][3], node[fwXUzBLmQDWEAjTLPJCVCw], [P], recovery_source[snapshot recovery [GlgFqTxrRIGxpWIldwRPdg] from _ccr_production:_latest_/_latest_], s[INITIALIZING], a[id=3FBNINKYSk2eXzVb-dw5ww], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-04-13T15:55:16.414Z], failed_attempts[4], failed_nodes[[jItPymr0QzifCI9Km3UkOg, Zpl7miC4T_SFsm2RRBUi1A, bxgXoKIBTpKYyEADXfCFTg, fwXUzBLmQDWEAjTLPJCVCw]], delayed=false, details[failed shard on node [Zpl7miC4T_SFsm2RRBUi1A]: failed recovery, failure RecoveryFailedException[[builds-20200410][3]: Recovery failed on {DevMaster2}{Zpl7miC4T_SFsm2RRBUi1A}{xE00h9O8Sbq7PMuUK1rjqw}{DevMaster2}{IP:9300}{dilm}{ml.machine_memory=135020195840, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: ConnectTransportException[[][IP:9300] connect_timeout[30s]]; ], allocation_status[fetching_shard_data]], expected_shard_size[0], message [failed recovery], failure [RecoveryFailedException[[builds-20200410][3]: Recovery failed on {DevMaster3}{fwXUzBLmQDWEAjTLPJCVCw}{EJvLsrXsTX2pdZ2SvJsYFQ}{DevMaster3}{IP:9300}{dilm}{ml.machine_memory=67378692096, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: ConnectTransportException[[][IP:9300] connect_timeout[30s]]; ], markAsStale [true]]
org.elasticsearch.indices.recovery.RecoveryFailedException: [builds-20200410][3]: Recovery failed on {DevMaster3}{fwXUzBLmQDWEAjTLPJCVCw}{EJvLsrXsTX2pdZ2SvJsYFQ}{DevMaster3}{IP:9300}{dilm}{ml.machine_memory=67378692096, xpack.installed=true, ml.max_open_jobs=20}
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2584) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.5.2.jar:7.5.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:353) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:283) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1867) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2580) ~[elasticsearch-7.5.2.jar:7.5.2]
        ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed
        at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:480) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:285) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:308) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:283) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1867) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2580) ~[elasticsearch-7.5.2.jar:7.5.2]
        ... 4 more
Caused by: org.elasticsearch.transport.ConnectTransportException: [][IP:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:995) ~[elasticsearch-7.5.2.jar:7.5.2]
        ... 4 more
[2020-04-13T15:56:48,180][DEBUG][o.e.a.a.c.s.r.RestoreClusterStateListener] [DevMaster] restore of [_latest_/_latest_] completed

Christian_Dahlqvist · April 13, 2020, 4:21pm

If you are running Elasticsearch 6.8.1 in both clusters, why is the stacktrace indicating Elasticsearch 7.5.2?

Matt_G · April 13, 2020, 4:25pm

Apologies for that, I forgot that we initially upgraded but had to downgrade our production cluster because the Cloudbees Jenkins Elasticsearch Plugin is not compatible with Elasticsearch 7.x. Would the dev instance being 7.5.2 be the issue?

To clarify my previous incorrect statement. The production cluster is running 6.8.1 and we're attempting to replicate data to our development cluster running 7.5.2. If needed I can downgrade the dev cluster

Christian_Dahlqvist · April 13, 2020, 4:37pm

Based on this table it looks like that combination should be fine, so will need to leave this for someone with more experience.

system · May 11, 2020, 4:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic cluster issue Elasticsearch	8	748	July 5, 2017
Connection timeout between nodes Elasticsearch	4	3628	July 5, 2017
Elastic cluster problem Elasticsearch	2	343	June 5, 2019
Cross-cluster Elasticsearch	2	252	May 31, 2023
Cluster issue -> raiseTimeoutFailure Elasticsearch	2	411	July 6, 2017

Elasticsearch Cross Cluster Replication connect_timeout

Related topics