Reindexing Strategy


(stratawing) #1

I have a need to update my mappings and will therefore need to reindex -
but will be doing so directly from an existing ES index (I'm using ES as my
data store). I've read other posts, and think the following strategy will
work, but have a couple questions:

Here's the strategy:

1 - create new index with new mappings and different index name
2 - extract data from old index (e.g., using scroll)
3 - bulk load from old index into new index
4 - confirm consistency between indices (how do I do this?)
5 - add 'alias' to new index to map old index name to new index
6 - delete old index.

I have two questions regarding the strategy above:

1 - I'm not clear on how to confirm that the new index is fully consistent
with the old index. Any suggestions? Understood that I may need to halt
writes to the old index while the comparison is being performed.

2 - There will be a moment (however brief) where both the new index (with
the alias) and old index are getting requests. Once the alias is in place,
will search results against both indices create duplicates in the result
set (i.e., the same doc will show up twice)? If so - it might create
problems in my application. Probably a minimal concern - but I'd like to
hear whether anyone else has had issues "swapping" indices using the alias
functionality in this manner.

Many thanks in advance for your input.

--


(Ivan Brusic) #2

For question 1, I am unsure about what your definition of
"consistency" might be. The availability of documents? The number of
documents/fields? The relevancy of certain searches? Changing your
mapping will affect your index and by definition, it will not be the
same as the original one. Consistency in the data world usual means
the consistent propagation of changes. Updating a mapping would not
affect that.

Question 2:
I use aliases for searching/incremental-updates, real index names for
full indexing. Once a new index is created, you can create an atomic
transaction that will remove the alias from the old index and create
an alias on the new one. Searches only know about the alias, never the
real index name.

Cheers,

Ivan

On Wed, Aug 29, 2012 at 10:54 AM, stratawing stratawing@gmail.com wrote:

I have a need to update my mappings and will therefore need to reindex - but
will be doing so directly from an existing ES index (I'm using ES as my data
store). I've read other posts, and think the following strategy will work,
but have a couple questions:

Here's the strategy:

1 - create new index with new mappings and different index name
2 - extract data from old index (e.g., using scroll)
3 - bulk load from old index into new index
4 - confirm consistency between indices (how do I do this?)
5 - add 'alias' to new index to map old index name to new index
6 - delete old index.

I have two questions regarding the strategy above:

1 - I'm not clear on how to confirm that the new index is fully consistent
with the old index. Any suggestions? Understood that I may need to halt
writes to the old index while the comparison is being performed.

2 - There will be a moment (however brief) where both the new index (with
the alias) and old index are getting requests. Once the alias is in place,
will search results against both indices create duplicates in the result set
(i.e., the same doc will show up twice)? If so - it might create problems
in my application. Probably a minimal concern - but I'd like to hear
whether anyone else has had issues "swapping" indices using the alias
functionality in this manner.

Many thanks in advance for your input.

--

--


(stratawing) #3

Thanks Ivan!

Very helpful answer on question 2. Regarding question 1, I really just
want to confirm that all documents from the old index were successfully
transferred to the new index. In light of your comments, I believe the
best approach is to just do a number of documents/fields check, and to
check the response from the bulk action to see if there were write errors
during the transfer. Unless I'm missing something, this should get me
enough information to confirm that everything was transferred. Let me know
if I'm mistaken.

Thanks again!

On Wednesday, August 29, 2012 6:30:25 PM UTC-4, Ivan Brusic wrote:

For question 1, I am unsure about what your definition of
"consistency" might be. The availability of documents? The number of
documents/fields? The relevancy of certain searches? Changing your
mapping will affect your index and by definition, it will not be the
same as the original one. Consistency in the data world usual means
the consistent propagation of changes. Updating a mapping would not
affect that.

Question 2:
I use aliases for searching/incremental-updates, real index names for
full indexing. Once a new index is created, you can create an atomic
transaction that will remove the alias from the old index and create
an alias on the new one. Searches only know about the alias, never the
real index name.

Cheers,

Ivan

On Wed, Aug 29, 2012 at 10:54 AM, stratawing <strat...@gmail.com<javascript:>>
wrote:

I have a need to update my mappings and will therefore need to reindex -
but
will be doing so directly from an existing ES index (I'm using ES as my
data
store). I've read other posts, and think the following strategy will
work,
but have a couple questions:

Here's the strategy:

1 - create new index with new mappings and different index name
2 - extract data from old index (e.g., using scroll)
3 - bulk load from old index into new index
4 - confirm consistency between indices (how do I do this?)
5 - add 'alias' to new index to map old index name to new index
6 - delete old index.

I have two questions regarding the strategy above:

1 - I'm not clear on how to confirm that the new index is fully
consistent
with the old index. Any suggestions? Understood that I may need to halt
writes to the old index while the comparison is being performed.

2 - There will be a moment (however brief) where both the new index
(with
the alias) and old index are getting requests. Once the alias is in
place,
will search results against both indices create duplicates in the result
set
(i.e., the same doc will show up twice)? If so - it might create
problems
in my application. Probably a minimal concern - but I'd like to hear
whether anyone else has had issues "swapping" indices using the alias
functionality in this manner.

Many thanks in advance for your input.

--

--


(system) #4