Back story:
We pushed a new release on Wednesday around 2pm. At that time we deleted the index, setup mapping, and reindexed everything. I watched the results on our search page drop to 0, then climb back up to around 1k as the indexing task completed. Everything looked fine.
However, this morning(Thursday) search suddenly was returning twice as many results. Over 2k. We do have a nightly refresh index task that kicks off around 2:30am. Notice the "last_update": "2011-12-08T02:30:28" value on the second record, but the first record has a "last_update":"2011-12-07T03:12:17" which is Tuesday night, before we did the release and before we deleted the index.
So the question is, how did these records come back from the dead? Replication error? Nodes not all in sync?
Back story:
We pushed a new release on Wednesday around 2pm. At that time we deleted
the index, setup mapping, and reindexed everything. I watched the results
on our search page drop to 0, then climb back up to around 1k as the
indexing task completed. Everything looked fine.
However, this morning(Thursday) search suddenly was returning twice as many
results. Over 2k. We do have a nightly refresh index task that kicks off
around 2:30am. Notice the "last_update": "2011-12-08T02:30:28" value on
the second record, but the first record has a
"last_update":"2011-12-07T03:12:17" which is Tuesday night, before we did
the release and before we deleted the index.
So the question is, how did these records come back from the dead?
Replication error? Nodes not all in sync?
When did oyu start to use elasticsearch (since which version)? Maybe it was
before a version that used the type when hashing (pre 0.13.0)? If it is,
then any future version that used the same data should have
set cluster.routing.operation.use_type to true in the settings, otherwise
you might get into this situation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.