I've recently had to reindex all docs due multi type mappings being deprecated.
There is one observation that I can't explain...
After reindexing (via _reindex API) I found that new index has more documents than the original one. How is that possible? I narrowed down which documents were new and I tried to search for them in the old index but with no luck...
Since the difference is pretty low, it would be interesting to proceed by dichotomy using a date_histogram aggregation on both indices and see in which buckets the differences are. Can you run this aggregation on both of your indices and see in which month (you might use year or day as well) the differences appear, then we can further drill down, until we find the culprit.
Our indices are monthly already. I did a visualisation which split data on _index term and nearly every index has got some discrepancies.
I also used your suggested script and split yearly data (so e.g. "index-2015*") by monthly interval and nearly every month has got higher count in the new index.
Ok, then let's take one month and drill down, by day, hour, minute... until we find one doc that is in the destination index but not in the source index.
I won't be able to share it as it has information about a customer and a specific order that was placed. What I tried to do was to pick some potentially unique fields, e.g. order value, channel id it came from, and do a search globally, hoping that something would be found but with no success.
Is it possible that the document has been deleted in the source index in the meantime (or during the reindex)? Do you have a process that deletes (old/rotten) documents based on some condition?
Sorry, not sure what you mean by diff. That document doesn't seem to exist - or at least can't get it to surface - in the old index, so other than the fact it exists in one index but not in the other there is nothing else I can compare.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.