Write to multiple indices via one alias


(Nicolas Lalevée) #1

It would be useful when I want to do a major change in the mapping. Say I have one index, being used for read and write. Now I create a new index with a new mapping. I want then to have queries still querying the first index, while write go throughout both, and a full reindex goes on the second. When the full reindex is finished, I can then use it for read. And the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in the db list the read and write indices. For the read, I am changing it so it will be managed via an alias. I was expecting to use aliases for write too but it is not possible the doc says [1]:
"It is an error to index to an alias which points to more than one index."

I'm a wondering why a such limitation.

Nicolas

[1] http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html


(Shay Banon) #2

The main reason for the limitation is basically the response, which is a
single "index" and not something similar to bulk response. We could simply
translates such an index operation to a bulk operation and do it against
multiple indices, but then representing the response is problematic... .
Open for ideas here, possibly we could have "additional" response section
with all the response, or something similar...

On Wed, May 9, 2012 at 7:51 PM, Nicolas Lalevée
nicolas.lalevee@hibnet.orgwrote:

It would be useful when I want to do a major change in the mapping. Say I
have one index, being used for read and write. Now I create a new index
with a new mapping. I want then to have queries still querying the first
index, while write go throughout both, and a full reindex goes on the
second. When the full reindex is finished, I can then use it for read. And
the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in the db list
the read and write indices. For the read, I am changing it so it will be
managed via an alias. I was expecting to use aliases for write too but it
is not possible the doc says [1]:
"It is an error to index to an alias which points to more than one index."

I'm a wondering why a such limitation.

Nicolas

[1]
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html


(Nicolas Lalevée) #3

Le 9 mai 2012 à 22:03, Shay Banon a écrit :

The main reason for the limitation is basically the response, which is a single "index" and not something similar to bulk response. We could simply translates such an index operation to a bulk operation and do it against multiple indices, but then representing the response is problematic... . Open for ideas here, possibly we could have "additional" response section with all the response, or something similar...

Ok I see.
Not much idea either, furthermore without breaking the API.

Nicolas

On Wed, May 9, 2012 at 7:51 PM, Nicolas Lalevée nicolas.lalevee@hibnet.org wrote:
It would be useful when I want to do a major change in the mapping. Say I have one index, being used for read and write. Now I create a new index with a new mapping. I want then to have queries still querying the first index, while write go throughout both, and a full reindex goes on the second. When the full reindex is finished, I can then use it for read. And the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in the db list the read and write indices. For the read, I am changing it so it will be managed via an alias. I was expecting to use aliases for write too but it is not possible the doc says [1]:
"It is an error to index to an alias which points to more than one index."

I'm a wondering why a such limitation.

Nicolas

[1] http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html


(Clinton Gormley) #4

On Wed, 2012-05-09 at 23:03 +0300, Shay Banon wrote:

The main reason for the limitation is basically the response, which is
a single "index" and not something similar to bulk response. We could
simply translates such an index operation to a bulk operation and do
it against multiple indices, but then representing the response is
problematic... . Open for ideas here, possibly we could have
"additional" response section with all the response, or something
similar...

The typical use case for this is as Nicolas describes: copying all
current changes to a new index while copying the old data to the new
index.

Would a nicer solution not be a river where you could specify the source
index and the destination index, and possibly a transformation script
(if some data needs to be changed between old and new).

You'd probably have to enable the timestamp field to make this possible.

clint

On Wed, May 9, 2012 at 7:51 PM, Nicolas Lalevée
nicolas.lalevee@hibnet.org wrote:
It would be useful when I want to do a major change in the
mapping. Say I have one index, being used for read and write.
Now I create a new index with a new mapping. I want then to
have queries still querying the first index, while write go
throughout both, and a full reindex goes on the second. When
the full reindex is finished, I can then use it for read. And
the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in
the db list the read and write indices. For the read, I am
changing it so it will be managed via an alias. I was
expecting to use aliases for write too but it is not possible
the doc says [1]:
"It is an error to index to an alias which points to more than
one index."

    I'm a wondering why a such limitation.
    
    Nicolas
    
    [1]
    http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

(Jérémie BORDIER) #5

On Thu, May 10, 2012 at 11:06 AM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2012-05-09 at 23:03 +0300, Shay Banon wrote:

The main reason for the limitation is basically the response, which is
a single "index" and not something similar to bulk response. We could
simply translates such an index operation to a bulk operation and do
it against multiple indices, but then representing the response is
problematic... . Open for ideas here, possibly we could have
"additional" response section with all the response, or something
similar...

The typical use case for this is as Nicolas describes: copying all
current changes to a new index while copying the old data to the new
index.

Would a nicer solution not be a river where you could specify the source
index and the destination index, and possibly a transformation script
(if some data needs to be changed between old and new).

You'd probably have to enable the timestamp field to make this possible.

clint

On Wed, May 9, 2012 at 7:51 PM, Nicolas Lalevée
nicolas.lalevee@hibnet.org wrote:
It would be useful when I want to do a major change in the
mapping. Say I have one index, being used for read and write.
Now I create a new index with a new mapping. I want then to
have queries still querying the first index, while write go
throughout both, and a full reindex goes on the second. When
the full reindex is finished, I can then use it for read. And
the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in
the db list the read and write indices. For the read, I am
changing it so it will be managed via an alias. I was
expecting to use aliases for write too but it is not possible
the doc says [1]:
"It is an error to index to an alias which points to more than
one index."

    I'm a wondering why a such limitation.

    Nicolas

    [1]
    http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

If Elasticsearch had support for collapsing, a possibility would be to
use two aliases:

  • One for searching that spans across the old and new indices
  • One for indexing that only points to the new index
  • Collapse on doc type + doc id with higher score on the new index

Unfortunately ES doesn't support collapsing so far..

--
Jérémie 'ahFeel' BORDIER


(Jack Chen) #6

+1 for the ability to update multiple indicies via an alias

On Friday, 11 May 2012 02:46:51 UTC+10, Jérémie BORDIER wrote:

On Thu, May 10, 2012 at 11:06 AM, Clinton Gormley <cl...@traveljury.com<javascript:>>
wrote:

On Wed, 2012-05-09 at 23:03 +0300, Shay Banon wrote:

The main reason for the limitation is basically the response, which is
a single "index" and not something similar to bulk response. We could
simply translates such an index operation to a bulk operation and do
it against multiple indices, but then representing the response is
problematic... . Open for ideas here, possibly we could have
"additional" response section with all the response, or something
similar...

The typical use case for this is as Nicolas describes: copying all
current changes to a new index while copying the old data to the new
index.

Would a nicer solution not be a river where you could specify the source
index and the destination index, and possibly a transformation script
(if some data needs to be changed between old and new).

You'd probably have to enable the timestamp field to make this possible.

clint

On Wed, May 9, 2012 at 7:51 PM, Nicolas Lalevée
<nicolas...@hibnet.org <javascript:>> wrote:
It would be useful when I want to do a major change in the
mapping. Say I have one index, being used for read and write.
Now I create a new index with a new mapping. I want then to
have queries still querying the first index, while write go
throughout both, and a full reindex goes on the second. When
the full reindex is finished, I can then use it for read. And
the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in
the db list the read and write indices. For the read, I am
changing it so it will be managed via an alias. I was
expecting to use aliases for write too but it is not possible
the doc says [1]:
"It is an error to index to an alias which points to more than
one index."

    I'm a wondering why a such limitation. 

    Nicolas 

    [1] 

http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

If Elasticsearch had support for collapsing, a possibility would be to
use two aliases:

  • One for searching that spans across the old and new indices
  • One for indexing that only points to the new index
  • Collapse on doc type + doc id with higher score on the new index

Unfortunately ES doesn't support collapsing so far..

--
Jérémie 'ahFeel' BORDIER

--


(Clinton Gormley) #7

On Wed, 2012-05-09 at 18:51 +0200, Nicolas Lalevée wrote:

It would be useful when I want to do a major change in the mapping. Say I have one index, being used for read and write. Now I create a new index with a new mapping. I want then to have queries still querying the first index, while write go throughout both, and a full reindex goes on the second. When the full reindex is finished, I can then use it for read. And the first index can be deleted. Smooth transition.
Currently all of this is managed on the client side, a conf in the db list the read and write indices. For the read, I am changing it so it will be managed via an alias. I was expecting to use aliases for write too but it is not possible the doc says [1]:
"It is an error to index to an alias which points to more than one index."

Are there any other use cases for writing to multiple indices?

I ask because the above use case will probably be better handled by the
planned changes stream.

clint

--


(Jörg Prante) #8

I agree, because I think writing to multiple indices is only a special case
of a more general one.

One exciting use case for a changes stream would be having a client that is
fed by pushed "change events" from one index, and transporting them to
other targets, possibly to a new index locally or even to another index on
a remote cluster, whatever.
A backup/restore operation would be required to complement this use case by
reading the indexed docs and write them in bulk mode to targets, in case
the change stream hook is activated later than the index was created. Not
sure about how the scan/scroll operation works, if it serves the docs in
the order they were indexed it would be perfect. I have to find out how to
handle the condition when the bulk mode should stop and the event mode
steps in.

Best regards,

Jörg

On Monday, September 10, 2012 11:27:13 AM UTC+2, Clinton Gormley wrote:

I ask because the above use case will probably be better handled by the
planned changes stream.

clint

--


(Paul Smith) #9

On 11 September 2012 01:28, Jörg Prante joergprante@gmail.com wrote:

I agree, because I think writing to multiple indices is only a special
case of a more general one.

One exciting use case for a changes stream would be having a client that
is fed by pushed "change events" from one index, and transporting them to
other targets, possibly to a new index locally or even to another index on
a remote cluster, whatever.
A backup/restore operation would be required to complement this use case
by reading the indexed docs and write them in bulk mode to targets, in case
the change stream hook is activated later than the index was created. Not
sure about how the scan/scroll operation works, if it serves the docs in
the order they were indexed it would be perfect. I have to find out how to
handle the condition when the bulk mode should stop and the event mode
steps in.

This is almost 'transaction log shipping' in a traditional DB model, and I
think would be awesome!

I just wanted to confirm this 'changes stream' is related to Jörg's
Websockets transport plugin? Clint mentioned it as if it was part of the
ES roadmap, and I know Shay talked about an inter-DC replication mechanism
plan at some point, and I just wanted to confirm that the changes stream
concept is what that was about. Maybe this Websockets thing is another way
to think of this?

When I read about the Websockets plugin and the changes stream, it didn't
click about what that meant, but this is quite exciting. I can see how in
our case, and maybe others, one could use the dynamic settings to partition
a current index to live on a certain # nodes, leaving a set of nodes carved
out for the new index (partition the IO load), so that this changes stream
can be interleaved with another process that is doing the raw reindex of
the original data, allowing live changes to continue to be applied, and
allow any VersionConflictException to simply be trapped and ignored as a
case of 'ok well the latest information is now already there'. At the end
of the process you have a newly created index with all recent changes
applied as well, ready for a hot-swap via an alias perhaps.

Is that sort of the idea you had in mind Jörg ?

cheers,

Paul

--


(system) #10