I need to re-index multiple indices prior to an update of our stack.
In order to test the reindexing process, I am cloning an index. However, the index needs to be read-only. My question is...what if data is ingested during the read-only phase? Will it be lost?
Probably yes, to clone an index it needs to be on read-only mode, which blocks write, so new data will not be added to the index.
Depending on how you are indexing the data, it may be written after you remove the read-only, for example, but you also may have lost those data.
Clone and reindex are different operations, you can reindex an index that is still receiving data, but it will only reindex the data that already exists in the index when you triggered the reindex action, every new data added after this will note be present on the reindexed index and you will need to do another reindex to get that data.
Regarding the reindexing, do you mean that data added, let's say even one week later, will not be reindexed? In my environment, data is being ingested all the time.
If you make a reindex request for an index now, and 1 milisecond later you add a new document, this new document will not be part of that reindex request and will not be reindexed.
So basically when you reindex, it sort of locks your index forever?
I need to reindex everything because they were created with an older version. Someone from Elastic Search told me I need to reindex pretty much everything but never told me I wouldn't be able to work with my indices afterwards...
If you need to reindex your indices to increase the index version of it, you should stop writing on the old indices and start writing in new ones.
If you keep writing in the old indices, it makes no sense to reindex because a reindex is a snapshot of the data at the moment when the request is made, you would need to keep doing reindex after reindex to keep your data up to date.
In your case you should start ingest your data in new indices and reindex the old data from indices that are not being written anymore.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.