I found this thread was a useful starting point to handle my own simliar
issues. I did come up with another solution but wanted to run it by you to
see if you can spot any issues with it.
The basic concept is to perform all reads of an index with one alias and
all writes with another, eg. index_read and index_write
When a reindex was performed, another index would be created and added to
the index_write alias
index_read -> old_index
index_write -> old_index
This way all updates come through on the new index while it is being
populated and the old index continues to return newly updated documents.
Once the reindexing is complete, I'd remove the old index and add the new
to the read alias, and remove the old index from the write alias.
Then I could delete the old and it should all be quite seemless.
I have my own python code to access elasticsearch so the code that
references the two aliases in all in the one place.
Does this seems like a reasonable approach?
I think this could be made even easier if there was a mechanism similar to
the "search_routing" and "index_routing" features. Perhaps the aliases
could have add_search and add_index actions. Kimchy, does this sound like a
On Wednesday, September 14, 2011 7:23:04 AM UTC+10, Curtis Caravone wrote:
Good point, I'll have to think about this some more.
On Tue, Sep 13, 2011 at 2:15 PM, Shay Banon email@example.com wrote:
Taking data while having the old index around can be problematic if you
have updates / deletes. You might need to still apply the changes to the
old index, and buffer them while you reindex to apply them again or
something like that.
On Wed, Sep 14, 2011 at 12:02 AM, Curtis Caravone firstname.lastname@example.org:
Ok, I think that answers my questions.
The reason I proposed using multiple indices with aliases is that I want
to be able to take in new data while the migration is taking place. For
index_old: index with old mapping and data
index_new: index with new mapping
At some point, switch writes to index_new, so it starts filling up with
Then, migrate data from old to new while the system is online and still
taking new data.
I want to be able to search across both old and new during this process,
similar to an online rebuild of a database index.
Is there a better way I could be doing this?
On Tue, Sep 13, 2011 at 1:16 PM, Shay Banon email@example.com wrote:
I think you are trying to use the aliases wrongly. When you reindex and
want to do "hot" replace of indices, you use a single alias pointing to a
single index as the one the "client" uses. For example, have alias1 point
to index1. Then, you reindex the data into index2, and once its done, you
switch (in the same command) alias1 to point to index2 from index1.
In the above usecase, there is no point where an alias is pointing to
more than one index. Of course, you can have an alias point to more than
one index, but it only really make sense when searching, and in this case,
its the same as searching across several indices, where the index is part
of the "uniqueness" of the document.
On Tue, Sep 13, 2011 at 9:05 PM, Curtis Caravone firstname.lastname@example.org:
Ok, I'm going with the strategy of creating a new index, but I want to
do the reindexing all online using aliases.
That leads to a couple of alias questions:
- How are doc ids treated when you do a get (or search) operation on
an alias with multiple underlying indices?
- What happens if two docs with the same id exist in two of the
underlying indices? Is there some precedence or order to which doc is
- Does the index status API work with aliases? For example, can I
wait for yellow status on an alias rather than listing all the underlying
On Mon, Sep 12, 2011 at 11:18 AM, Curtis Caravone email@example.com:
Ok, thanks. In that case I will go the route of indexing into a new
On Mon, Sep 12, 2011 at 2:19 AM, Shay Banon firstname.lastname@example.org wrote:
You can't change a field from not being indexed to being indexed.
You can change the default analyzer, but this will only affect future
documents indexed (by closing the index, updating the index settings, and
then opening it).
If you end up reindexing the data, why not just index it into a new
index with the new mappings?
On Mon, Sep 12, 2011 at 3:00 AM, Curtis Caravone <email@example.com
We are in a situation where we need to reindex a few hundred
million docs (add some indexed fields, add some new fields). We hope to do
this online by changing the mapping then performing updates on all the old
docs to reindex them with the new mapping.
Along these lines, I have a couple of questions:
- Can a field mapping be changed from "index":"no" to to indexed
using the put mapping API?
- Can the default analyzer be changed with put mapping API?