Update across indices


(Rahul Sharma) #1

Hi,

Elastic Search nicely does update behind the scene when I provide an "_id"
while indexing. Its pretty cool!!
But this happens only on one Index. Is it possible to do across indices?

In my application I do rollover indexes, it create new indexes after a
certain number of documents.
But there are scenarios where update request comes for a document which is
part of one of the old index which is not pointing by the alias.
And hence it creates a new document instead of updating.

One way is that I keep track of the ids on each Index and then push the
document to that particular index if its an update. But thats a very ugly
looking approach. Is there a better way?

Alternatively is there a support for Update By Query something like
DeleteByQuery?

Your response will be much appreciated!

Thanks
Rahul


(Benjamin Devèze) #2

Heya,

  1. for your first problem there is no cross indexes ids. If you do
    rollover indexes based on a certain number of documents I think you
    are forced to maintain a pair id/index_name to properly update your
    old documents. You could alternatively use the update API on all your
    indexes but it won't be great either. Have you considered using
    rollover indexes based on time?

  2. Update by query is not implemented yet but there is an opened issue
    for it https://github.com/elasticsearch/elasticsearch/issues/1607

On Thu, Jun 14, 2012 at 9:10 AM, Rahul Sharma
rahul.sharma.coder@gmail.com wrote:

Hi,

Elastic Search nicely does update behind the scene when I provide an "_id"
while indexing. Its pretty cool!!
But this happens only on one Index. Is it possible to do across indices?

In my application I do rollover indexes, it create new indexes after a
certain number of documents.
But there are scenarios where update request comes for a document which is
part of one of the old index which is not pointing by the alias.
And hence it creates a new document instead of updating.

One way is that I keep track of the ids on each Index and then push the
document to that particular index if its an update. But thats a very ugly
looking approach. Is there a better way?

Alternatively is there a support for Update By Query something like
DeleteByQuery?

Your response will be much appreciated!

Thanks
Rahul

--
Benjamin DEVEZE


(Benjamin Devèze) #3

Would it be interesting to add a new op_type "replace" for the index
API. In this case the doc would be indexed only if it exists already?

On Thu, Jun 14, 2012 at 9:59 AM, Benjamin Devèze
benjamin.deveze@gmail.com wrote:

Heya,

  1. for your first problem there is no cross indexes ids. If you do
    rollover indexes based on a certain number of documents I think you
    are forced to maintain a pair id/index_name to properly update your
    old documents. You could alternatively use the update API on all your
    indexes but it won't be great either. Have you considered using
    rollover indexes based on time?

  2. Update by query is not implemented yet but there is an opened issue
    for it https://github.com/elasticsearch/elasticsearch/issues/1607

On Thu, Jun 14, 2012 at 9:10 AM, Rahul Sharma
rahul.sharma.coder@gmail.com wrote:

Hi,

Elastic Search nicely does update behind the scene when I provide an "_id"
while indexing. Its pretty cool!!
But this happens only on one Index. Is it possible to do across indices?

In my application I do rollover indexes, it create new indexes after a
certain number of documents.
But there are scenarios where update request comes for a document which is
part of one of the old index which is not pointing by the alias.
And hence it creates a new document instead of updating.

One way is that I keep track of the ids on each Index and then push the
document to that particular index if its an update. But thats a very ugly
looking approach. Is there a better way?

Alternatively is there a support for Update By Query something like
DeleteByQuery?

Your response will be much appreciated!

Thanks
Rahul

--
Benjamin DEVEZE

--
Benjamin DEVEZE


(Rahul Sharma) #4

Hey,
Thanks for your quick response.
I have two schemes for rollover. In once scenario its based on time whereas
in the other its on number of docs.
It would be really helpful to get an updateBy query kind of an API.

For now I am thinking of following approaches:

  1. To keep an additional field with each document which will keep the
    indexName.
    So while indexing I would do a check by Ids to find the docs that needs an
    update and separate it out from the ones that needs insert. But the
    downside is it would mean IO overheads.
  2. Create another associated index to keep track of the Index vs Doc ID
    mapping. Here I can store all the ids in a per index doc (comma separated)
    and process it using CPU cycle. Downside is the Doc Size can grow over the
    roof and more CPU cycles.

As of now I am more inclined towards Approach 1 as I don't have to take
care of managing additional index mapping especially in scenarios of
deletes.
Any other suggestions?

Thanks
Rahul

On Thu, Jun 14, 2012 at 1:29 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

Heya,

  1. for your first problem there is no cross indexes ids. If you do
    rollover indexes based on a certain number of documents I think you
    are forced to maintain a pair id/index_name to properly update your
    old documents. You could alternatively use the update API on all your
    indexes but it won't be great either. Have you considered using
    rollover indexes based on time?

  2. Update by query is not implemented yet but there is an opened issue
    for it https://github.com/elasticsearch/elasticsearch/issues/1607

On Thu, Jun 14, 2012 at 9:10 AM, Rahul Sharma
rahul.sharma.coder@gmail.com wrote:

Hi,

Elastic Search nicely does update behind the scene when I provide an
"_id"
while indexing. Its pretty cool!!
But this happens only on one Index. Is it possible to do across indices?

In my application I do rollover indexes, it create new indexes after a
certain number of documents.
But there are scenarios where update request comes for a document which
is
part of one of the old index which is not pointing by the alias.
And hence it creates a new document instead of updating.

One way is that I keep track of the ids on each Index and then push the
document to that particular index if its an update. But thats a very ugly
looking approach. Is there a better way?

Alternatively is there a support for Update By Query something like
DeleteByQuery?

Your response will be much appreciated!

Thanks
Rahul

--
Benjamin DEVEZE


(system) #5