Validation of data post reindex from remote

What is the best to validate data was migrated properly when doing a reindex?

We are migrating data from ES 1.7 to ES 5.1. We are planning on using the Reindex from Remote API, passing in a query to limit on subsets of the entire data set.

Our thinking is to save off the data somehow and then compare it. Possible ways we were looking into doing validation:

  • Using the pagination API. This page however seems to recommend using the scroll API.
  • Using the scroll API
  • Using ElasticDump

Alternatives to this include validating subsets of the data rather than the whole data dump, especially if the data is excessively large.

However all of this may be moot if Elastic Search already does some sort of internal validation while migrating data. Or maybe there is another utility available for this type of validation?

Thanks,
Tom

why not just issue a query to Elastic 1.7 for the _id or any UNIQUE field and then query that record exists in 5.1

No need to dump the data to do that, you can even validate at any record level once you got the basic script written.

However A simple record count should tell you if ever document is transferred , then if something did not transfer, you would know it just buy the total record count. I would expect what ever tool you are using to migrate the data to report errors

Hi,

Thanks for the feedback. My main issue is we could potentially have millions of documents, and we want to limit downtime. To check on every ID would be costly, in terms of time, I would think.

We are thinking to select every 1,000 documents (or some other number), scrolling using the scroll API, and compare them with one another between ES 1.7 and 5.1.

There is no other built-in validation? Can we just assume if the Reindex completes that it was 100% successful?

Tom

there is no validation tool as everyone's data is different and how it gets validated is different.

If your asking me if you can presume your transfer of data is 100% , I can't answer that as I don't know how your transferring the data form 1.7 to 5.1

It should be successful presuming that there are no errors but yes you should take some steps to validate your data. Making sure fields are the correct type and that all of the documents have been brought over.

I am sorry I can not be more explicit in the answers.

Hi Ed,

We are just using the Reindex from Remote API request (_reindex).

Since there is no 'official' answer, we will probably just go with the approach of choosing some of the documents to validate, unless it's feasible to check every one, which probably isn't the case given we want to limit downtime.

Thanks,
Tom

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.