Hi!
So, we do have two elasticsearch environments up and running with CCR configured.
Everything works fine so far. But after checking the document count we noticed a mismatch in numbers. This only seems to affect "bigger" indices.
When digging deeper, we found out, that
_ccr/stats
returns
...
"recent_auto_follow_errors" : [
{
"leader_index" : "AutoFollowerPatterns",
"timestamp" : 1564653416136,
"auto_follow_exception" : {
"type" : "exception",
"reason" : "RemoteTransportException[[10.176.39.38][10.176.39.38:9300][cluster:monitor/state]]; nested: NodeClosedException[node closed {10.176.39.38}{wOe54CAlTKW47GD473DCoA}{qjM8vEWrS5ybm0D2bB8Iaw}{10.176.39.38}{10.176.39.38:9300}{rack_id=us-west-2c, xpack.installed=true, zone=us-west-2}];",
"caused_by" : {
"type" : "node_closed_exception",
"reason" : "node closed {10.176.39.38}{wOe54CAlTKW47GD473DCoA}{qjM8vEWrS5ybm0D2bB8Iaw}{10.176.39.38}{10.176.39.38:9300}{rack_id=us-west-2c, xpack.installed=true, zone=us-west-2}"
}
}
}
]
...
In the elastic logs we see the following entry:
[2019-08-01T09:56:56,590][WARN ][o.e.x.c.a.ShardFollowTasksExecutor] [10.176.68.180] [ccr-visit-global-standard-w2019.30-1024][1] background management of retention lease [elastic-useast/ccr-visit-global-standard-w2019.30-1024/FkRtqtL-Q1aujddl0UvzxA-following-elastic-uswest/visit-global-standard-w2019.30-1024/7vymksCxRdeqkz-ThYACcg] failed while following
org.elasticsearch.transport.NodeNotConnectedException: [10.176.39.38][10.176.39.38:9300] Node not connected
at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:151) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:557) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:516) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:251) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:209) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:100) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:62) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:143) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.8.1.jar:6.8.1]
...
I'm not sure if those two messages relate to each other but at first glimpse it does.
So my question is, how can I make sure, that the missing data still gets replicated or how can I prevent this in first place?
Thanks in advance!
p.s we use Elastic 6.8.1