Problem with index aliases and delete operation?

Benjamin_Deveze · October 11, 2012, 10:36pm

Suppose we use the "usersdata flow" exposed in kimchy's "Big Data, Search
and Analytics" presentation to index users documents in a big oversharded
index with routing+filtering aliases. Some users content are moved to their
own index when becoming too big.

From the client perspective we don't know if we are manipulating a virtual
index via an alias or a real index dedicated to the user and the code
should be the same. So if an user is deleted and we want to remove its
content the client code do something like curl -XDELETE
localhost:9200/$INDEX_NAME. The power of aliasing being that the client is
agnostic of knowing if it deals with a real index or an aliased one.

But with the current behavior if $INDEX_NAME is an alias the whole big
oversharded index will be deleted and all users document are lost!

IMHO it is really dangerous. I was expecting a simple delete index if the
$INDEX_NAME is a real index under the hood and maybe a delete by query
(even if it is expensive) or at least a failure or something like that
maybe configurable via a parameter if $INDEX_NAME is an alias linking to
the big oversharded index.

What do you think?

--

Chris_Male · October 11, 2012, 11:30pm

That does sound a little unexpected, certainly in the use case you put
forward. Why don't you open an issue so we can explore what improvements
can be made at the code level.

On Friday, October 12, 2012 11:36:48 AM UTC+13, Benjamin Devèze wrote:

Suppose we use the "usersdata flow" exposed in kimchy's "Big Data, Search
and Analytics" presentation to index users documents in a big oversharded
index with routing+filtering aliases. Some users content are moved to their
own index when becoming too big.

From the client perspective we don't know if we are manipulating a virtual
index via an alias or a real index dedicated to the user and the code
should be the same. So if an user is deleted and we want to remove its
content the client code do something like curl -XDELETE
localhost:9200/$INDEX_NAME. The power of aliasing being that the client is
agnostic of knowing if it deals with a real index or an aliased one.

But with the current behavior if $INDEX_NAME is an alias the whole big
oversharded index will be deleted and all users document are lost!

IMHO it is really dangerous. I was expecting a simple delete index if the
$INDEX_NAME is a real index under the hood and maybe a delete by query
(even if it is expensive) or at least a failure or something like that
maybe configurable via a parameter if $INDEX_NAME is an alias linking to
the big oversharded index.

What do you think?

--

Benjamin_Deveze · October 12, 2012, 6:46am

Thanks!

Done here Index aliases and delete operation · Issue #2318 · elastic/elasticsearch · GitHub

On Friday, October 12, 2012 1:30:30 AM UTC+2, Chris Male wrote:

That does sound a little unexpected, certainly in the use case you put
forward. Why don't you open an issue so we can explore what improvements
can be made at the code level.

On Friday, October 12, 2012 11:36:48 AM UTC+13, Benjamin Devèze wrote:

Suppose we use the "usersdata flow" exposed in kimchy's "Big Data, Search
and Analytics" presentation to index users documents in a big oversharded
index with routing+filtering aliases. Some users content are moved to their
own index when becoming too big.

From the client perspective we don't know if we are manipulating a
virtual index via an alias or a real index dedicated to the user and the
code should be the same. So if an user is deleted and we want to remove its
content the client code do something like curl -XDELETE
localhost:9200/$INDEX_NAME. The power of aliasing being that the client is
agnostic of knowing if it deals with a real index or an aliased one.

But with the current behavior if $INDEX_NAME is an alias the whole big
oversharded index will be deleted and all users document are lost!

IMHO it is really dangerous. I was expecting a simple delete index if the
$INDEX_NAME is a real index under the hood and maybe a delete by query
(even if it is expensive) or at least a failure or something like that
maybe configurable via a parameter if $INDEX_NAME is an alias linking to
the big oversharded index.

What do you think?

--