Problem with index aliases and delete operation?


(Benjamin Devèze) #1

Suppose we use the "usersdata flow" exposed in kimchy's "Big Data, Search
and Analytics" presentation to index users documents in a big oversharded
index with routing+filtering aliases. Some users content are moved to their
own index when becoming too big.

From the client perspective we don't know if we are manipulating a virtual
index via an alias or a real index dedicated to the user and the code
should be the same. So if an user is deleted and we want to remove its
content the client code do something like curl -XDELETE
localhost:9200/$INDEX_NAME. The power of aliasing being that the client is
agnostic of knowing if it deals with a real index or an aliased one.

But with the current behavior if $INDEX_NAME is an alias the whole big
oversharded index will be deleted and all users document are lost!

IMHO it is really dangerous. I was expecting a simple delete index if the
$INDEX_NAME is a real index under the hood and maybe a delete by query
(even if it is expensive) or at least a failure or something like that
maybe configurable via a parameter if $INDEX_NAME is an alias linking to
the big oversharded index.

What do you think?

--


(Chris Male) #2

That does sound a little unexpected, certainly in the use case you put
forward. Why don't you open an issue so we can explore what improvements
can be made at the code level.

On Friday, October 12, 2012 11:36:48 AM UTC+13, Benjamin Devèze wrote:

Suppose we use the "usersdata flow" exposed in kimchy's "Big Data, Search
and Analytics" presentation to index users documents in a big oversharded
index with routing+filtering aliases. Some users content are moved to their
own index when becoming too big.

From the client perspective we don't know if we are manipulating a virtual
index via an alias or a real index dedicated to the user and the code
should be the same. So if an user is deleted and we want to remove its
content the client code do something like curl -XDELETE
localhost:9200/$INDEX_NAME. The power of aliasing being that the client is
agnostic of knowing if it deals with a real index or an aliased one.

But with the current behavior if $INDEX_NAME is an alias the whole big
oversharded index will be deleted and all users document are lost!

IMHO it is really dangerous. I was expecting a simple delete index if the
$INDEX_NAME is a real index under the hood and maybe a delete by query
(even if it is expensive) or at least a failure or something like that
maybe configurable via a parameter if $INDEX_NAME is an alias linking to
the big oversharded index.

What do you think?

--


(Benjamin Devèze) #3

Thanks!

Done here https://github.com/elasticsearch/elasticsearch/issues/2318

On Friday, October 12, 2012 1:30:30 AM UTC+2, Chris Male wrote:

That does sound a little unexpected, certainly in the use case you put
forward. Why don't you open an issue so we can explore what improvements
can be made at the code level.

On Friday, October 12, 2012 11:36:48 AM UTC+13, Benjamin Devèze wrote:

Suppose we use the "usersdata flow" exposed in kimchy's "Big Data, Search
and Analytics" presentation to index users documents in a big oversharded
index with routing+filtering aliases. Some users content are moved to their
own index when becoming too big.

From the client perspective we don't know if we are manipulating a
virtual index via an alias or a real index dedicated to the user and the
code should be the same. So if an user is deleted and we want to remove its
content the client code do something like curl -XDELETE
localhost:9200/$INDEX_NAME. The power of aliasing being that the client is
agnostic of knowing if it deals with a real index or an aliased one.

But with the current behavior if $INDEX_NAME is an alias the whole big
oversharded index will be deleted and all users document are lost!

IMHO it is really dangerous. I was expecting a simple delete index if the
$INDEX_NAME is a real index under the hood and maybe a delete by query
(even if it is expensive) or at least a failure or something like that
maybe configurable via a parameter if $INDEX_NAME is an alias linking to
the big oversharded index.

What do you think?

--


(system) #4