So along the lines of why does this need to happen, what needs to happen? I see that the reindex API is the recommended method, but this looks like a utility that just copies one index to another. I could be wrong, but I don't see how that could fix any problem. I could just map a "new" index that matches my old one and use a data transfer utility.
So what is the actual problem I am trying to solve here? I have several indices that were made in 2.3, that are not in 5.5. I want to be able to look at them and know where the issue will be.
My guess is because Lucene is guaranteed to backwards compatible for only
one major version. Elasticsearch 2.3 uses Lucene 5 [1], whereas
Elasticsearch 6 will use Lucene 7 [2]
Reindexing goes through the entire indexing process again, recreating the
Lucene indices. You simply cannot move the indices over. It would be more
efficient to simply use the Lucene index updater tool instead of
reindexing, if it exists in Lucene 7.
@Ivan - so for instance in 2.3 I was using a String data type in a lot of my mapping. Will the reindex api actually change that mapping to a text or keyword now?
Correct. Keep in mind that there is more data stored in Elasticsearch than
simply the Lucene indices. The cluster state has additional information as
well such as index settings. I never jumped two Elasticsearch versions. If
you can upgrade first to Elasticsearch 5.x, reopening Lucene indices that
are one version behind and then doing a force merge will update the Lucene
index format. Repeat again for Elasticsearch 6.
Of course, the old indices might have older mapping settings which cannot
be updated.
@Ivan - I will be upgrading from 5.5 (or 5.6) to 6.0 when the time times. I am just posting based on that article I references in the original post. It says:
Reindex indices from Elasticseach 2.x or before
Indices created in Elasticsearch 2.x or before will need to be reindexed with Elasticsearch 5.x in order to be readable by Elasticsearch 6.x. The easiest way to reindex old indices is to use the reindex API.
I do have indicies that were originally created in 2.3. The process of updating to 5.5 "updated the indicies" but I do not know if they were reindexed. Should I assume that because the database is working fine that this happened in the update?
I know that 6.0 is only going to allow one _type per index, so this is another change that I am curently working on. Just trying to get all my ducks in a row for when 6.0 is released.
From a Lucene perspective, running a force merge down to one segment will
create a new segment, with the latest version, since segments are immutable
and must be recreated. There must be other reasons why a reindex is
necessary, very likely due to mapping changes. Hopefully I have not wasted
your time and someone with more knowledge, like @dadoonet, will chime in.
@Ivan I don't think you wasted anyone's time and appreciate you trying to help. Based on your previous post I am starting to think I may be ok. My older indicies were created in 2.3 but have been updated to 5.5. So I think maybe I am ok without a reindex...??
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.