CCR replication paused indices

Jairam_Gauns · January 24, 2023, 12:12pm

Hi Guys,

I am new to this CCR setup. The CCR setup was working fine when we had fewer and smaller indices, but as of today we have many indices. Monitoring of the ccr was not done for quite sometime. We saw there were a few paused indices and enabling it took us like more than 4 hours to get in sync. I mean it should be more of a realtime transfer of data but its not happening. So need some help on how to tackle/ handle replication on DR site. We have enterprise license.

After running GET /_ccr/stats as of today i see
"auto_follow_stats" : {
    "number_of_failed_follow_indices" : 21753791,
    "number_of_failed_remote_cluster_state_requests" : 16,
    "number_of_successful_follow_indices" : 0,
}
the type of exception i see for auto_follow is "reason" : "index to follow [abc_au_error_2022.10.01] for pattern [abc*] matches with other patterns [abc]".

On individual index also there are various fatal exceptions
"fatal_exception" : {
              "type" : "index_not_found_exception",
              "reason" : "no such index [abc_au_error_2022.02.20]"
}
"fatal_exception" : {
              "type" : "circuit_breaking_exception",
              "reason" : "[parent] Data too large, data for [<transport_request>] would be [24467651698/22.7gb], which is larger than the limit of [24398446592/22.7gb], real usage: [24467650928/22.7gb], new bytes reserved: [770/770b], usages [request=0/0b, fielddata=30948187/29.5mb, in_flight_requests=158504/154.7kb, accounting=620829855/592mb]",
              "bytes_wanted" : 24467651698,
              "bytes_limit" : 24398446592,
              "durability" : "PERMANENT"
}

"fatal_exception" : {
              "type" : "process_cluster_event_timeout_exception",
              "reason" : "failed to process cluster event (put-mapping) within 30s"
}

"fatal_exception" : {
              "type" : "resource_not_found_exception",
              "reason" : "Operations are no longer available for replicating. Existing retention leases [[RetentionLease{id='peer_recovery/lc-tMfA4QFyrm2M1XgbMVg', retainingSequenceNumber=6801469, timestamp=1662746628089, source='peer recovery'}, RetentionLease{id='peer_recovery/wgyOlKHmS--NyIWW9Le1ow', retainingSequenceNumber=6801469, timestamp=1662746628089, source='peer recovery'}]]; maybe increase the retention lease period setting [index.soft_deletes.retention_lease.period]?",
              "requested_operations_missing" : [
                "6797017",
                "6801468"
              ],
              "caused_by" : {
                "type" : "illegal_state_exception",
                "reason" : "Not all operations between from_seqno [6797017] and to_seqno [6801468] found; expected seqno [6797017]; found [Index{id='13500', type='_doc', seqNo=6801369, primaryTerm=7, version=23666, autoGeneratedIdTimestamp=-1}]"
              }
}

Since i am new to this DR related CCR activity. if i can get some pointers on how to tackle this would be helpful.

system · February 21, 2023, 12:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Alert for paused indices on CCR Kibana ccr-cross-cluster-replication	1	295	February 21, 2023
CCR follower for large index fails Elasticsearch ccr-cross-cluster-replication	1	1165	January 5, 2021
How to achieve CCR for more than 20 indices at once without snapshot Elasticsearch ccr-cross-cluster-replication	5	817	August 29, 2021
CCR - Remove follow_stats and License fetching problem Elasticsearch ccr-cross-cluster-replication	8	2228	May 9, 2019
NodeClosedException while auto-following indices Elasticsearch ccr-cross-cluster-replication	6	598	September 4, 2019

CCR replication paused indices

Related topics