What happens new data is created during reindexing procedure?


(Ted Kyungtae Kim) #1

I have ran across re-indexing the data with zero-down time. However, what happens if new data comes in? Does the re-indexed index reflect the changes?

We recently changed the mapping which required us to reindex the data( about 1.8 documents). It took nearly an hour to finish the process. We shutdown the server during the process, because we weren't sure how we can handle the incoming data during the process.

Is is guaranteed to have the same dataset after the reindexing process? (we used Python helpers.reindex)

Thanks

-Ted


(Nik Everett) #2

Your helper used the scroll API to find the documents its reindexed. Scrolls keep the same snapshot of the index the entire time they are active. So deletes and updates won't be reflected in the reindex.


(Ted Kyungtae Kim) #3

Is there any way I can apply those changes after the reindexiing? Numbers of indices are keep growing and it take more and more time to reindex the data.

We've shutdown the server for three hours this time. Do we have keep shutdown the server long enough to reindex the whole data?

I know changing mapping and reindexing are not a daily task, but still wants to find a better way.

Below shows how to use alias, but it doesn't necessarily guarantee the final sizes of the datasets.

https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html


(David Pilato) #4

If you have in your data a field which could help you to define what is new or updated, you can run a scroll again but with a query instead of match_all.

That said this won't work for deletes. So you have to manually remove docs.

My 2 cents.


(system) #5