Comparing Two Indices Across Clusters

Hi All,

We are currently attempting to compare two indices across clusters to verify we did not lose any records or fields during the process of cross cluster replication, which is how we chose to do this data migration.

Our previous attempt looked to call the search api on both clusters for the max number of 10k records, then sorting on our unique IDs and then doing a deep equals locally between the two response bodies and then searching the next 10,000 records. However, we are seeing inconsistencies in the sorting on these IDs between the two clusters which are failing our deep equals.

Is there any reason that these sorts could be showing inconsistencies? They are long alphanumeric strings.

Secondly, we were looking at this transform example here:

Is there anything similar to this that we could use across two different clusters? Or perhaps a way to compare them both on the same cluster search? If you guys could point me in the right direction I would greatly appreciate it!

Thanks,
Ethan

Welcome to our community! :smiley:

Are you using _id? Are you defining this yourself?

We aren't all guys, but I would suggest that you look at using CCR as the best option for this.

Thank you! Apologies for the general use of the word :slight_smile:

We were using a unique id we gave the records themselves.

So we already performed the CCR successfully and can verify that the counts are the same between the two clusters and the respective indices. What we were hoping to do was a granular comparison at the record and field level to make sure the data is exactly the same between the two clusters.

I also forgot to mention we are currently using elasticserach 7.10.

That version is very old, so I would recommnd upgrading to 7.17 as soon as possible.

Transform supports CCS, therefore you can use it for this task.