Update

mooky · November 11, 2010, 6:04pm

Any thoughts on supporting an "update" feature in ES?

We have a need to update a quantity of documents - and rather than
rebuild the ES document and re-index, we'd rather read the data from
ES (since it has all the data), modify & reindex. (will be much
snappier, as re-assembling the ES document is a bit costly for us) .

What would be one step better is calling update with a query/filter &
pass, say, a js function to do our update and have ES execute it & re-
index all under the hood. That way we avoid having to write a bunch of
scrolling through large results sets, batching indexing operations and
we avoid shifting all the data back to the client and then back to the
ES node(s).

Thoughts?

kimchy · November 12, 2010, 8:13am

Hi,

Yes, that certainly make sense. The difficulty of handling this revolves
around the distributed nature (more specifically, replication) of
operations. There are different ways to implement it:

Have the update js function run on the primary shard, and then batch
changes using a similar mechanism to the batch API. This will means
replicating the data to the replicas. This is simpler to implement, though
the update is not atomic or blocking (other index operations on the same
data might "get in").

This does require to refresh the relevant shard(s) before execution in
order to see the latest data.
Have the update function happen on the primary and the replicas. This is
more efficient when it comes to not needing to transfer the data to the
replicas, but the query will be executed on all replicas, and its much
harder to maintain consistency of shard and its replicas in this case (this
must be maintained of course).

aparo has been talking about it as well (on IRC), and even went ahead and
implemented a proof of concept code.

-shay.banon

On Thu, Nov 11, 2010 at 8:04 PM, Mooky nick.minutello@gmail.com wrote:

Any thoughts on supporting an "update" feature in ES?

We have a need to update a quantity of documents - and rather than
rebuild the ES document and re-index, we'd rather read the data from
ES (since it has all the data), modify & reindex. (will be much
snappier, as re-assembling the ES document is a bit costly for us) .

What would be one step better is calling update with a query/filter &
pass, say, a js function to do our update and have ES execute it & re-
index all under the hood. That way we avoid having to write a bunch of
scrolling through large results sets, batching indexing operations and
we avoid shifting all the data back to the client and then back to the
ES node(s).

Thoughts?

mooky · November 12, 2010, 6:36pm

Cool

On 12 November 2010 08:13, Shay Banon shay.banon@elasticsearch.com wrote:

Hi,

Yes, that certainly make sense. The difficulty of handling this revolves
around the distributed nature (more specifically, replication) of
operations. There are different ways to implement it:

Have the update js function run on the primary shard, and then batch
changes using a similar mechanism to the batch API. This will means
replicating the data to the replicas. This is simpler to implement, though
the update is not atomic or blocking (other index operations on the same
data might "get in").

This does require to refresh the relevant shard(s) before execution in
order to see the latest data.

Have the update function happen on the primary and the replicas. This is
more efficient when it comes to not needing to transfer the data to the
replicas, but the query will be executed on all replicas, and its much
harder to maintain consistency of shard and its replicas in this case (this
must be maintained of course).

aparo has been talking about it as well (on IRC), and even went ahead and
implemented a proof of concept code.

-shay.banon

On Thu, Nov 11, 2010 at 8:04 PM, Mooky nick.minutello@gmail.com wrote:

Any thoughts on supporting an "update" feature in ES?

We have a need to update a quantity of documents - and rather than
rebuild the ES document and re-index, we'd rather read the data from
ES (since it has all the data), modify & reindex. (will be much
snappier, as re-assembling the ES document is a bit costly for us) .

What would be one step better is calling update with a query/filter &
pass, say, a js function to do our update and have ES execute it & re-
index all under the hood. That way we avoid having to write a bunch of
scrolling through large results sets, batching indexing operations and
we avoid shifting all the data back to the client and then back to the
ES node(s).

Thoughts?

Alberto_Paro_2 · November 16, 2010, 8:15am

On 12/nov/2010, at 19.36, Nick Minutello wrote:

Cool

On 12 November 2010 08:13, Shay Banon shay.banon@elasticsearch.com wrote:
Hi,

Yes, that certainly make sense. The difficulty of handling this revolves around the distributed nature (more specifically, replication) of operations. There are different ways to implement it:

Have the update js function run on the primary shard, and then batch changes using a similar mechanism to the batch API. This will means replicating the data to the replicas. This is simpler to implement, though the update is not atomic or blocking (other index operations on the same data might "get in").

This does require to refresh the relevant shard(s) before execution in order to see the latest data.

Have the update function happen on the primary and the replicas. This is more efficient when it comes to not needing to transfer the data to the replicas, but the query will be executed on all replicas, and its much harder to maintain consistency of shard and its replicas in this case (this must be maintained of course).

aparo has been talking about it as well (on IRC), and even went ahead and implemented a proof of concept code.

-shay.banon
I've implemented an update of ES for a my client at point 2. The problem, as Shay said, is the consistency of shard and its replicas.
There is a high risk to have broken/corrupted data in your index if there are problem during update (Verified in some my border cases).
So now I'm working to implement parent/children approuch as Shay suggested to me.

Hi,
Alberto Paro

kimchy · November 16, 2010, 8:31am

Hi Alberto,

I am going to work on parent child post 0.13... :), no promises though...

On Tue, Nov 16, 2010 at 10:15 AM, Alberto Paro alberto.paro@gmail.comwrote:

On 12/nov/2010, at 19.36, Nick Minutello wrote:

Cool

On 12 November 2010 08:13, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

Yes, that certainly make sense. The difficulty of handling this revolves
around the distributed nature (more specifically, replication) of
operations. There are different ways to implement it:

Have the update js function run on the primary shard, and then batch
changes using a similar mechanism to the batch API. This will means
replicating the data to the replicas. This is simpler to implement, though
the update is not atomic or blocking (other index operations on the same
data might "get in").

This does require to refresh the relevant shard(s) before execution in
order to see the latest data.

Have the update function happen on the primary and the replicas. This
is more efficient when it comes to not needing to transfer the data to the
replicas, but the query will be executed on all replicas, and its much
harder to maintain consistency of shard and its replicas in this case (this
must be maintained of course).

aparo has been talking about it as well (on IRC), and even went ahead and
implemented a proof of concept code.

-shay.banon

I've implemented an update of ES for a my client at point 2. The problem,
as Shay said, is the consistency of shard and its replicas.
There is a high risk to have broken/corrupted data in your index if there
are problem during update (Verified in some my border cases).
So now I'm working to implement parent/children approuch as Shay suggested
to me.

Hi,
Alberto Paro

Topic		Replies	Views
Official Support for UpdateByQuery? Elasticsearch	1	273	July 6, 2017
How update, update_by_query in ES really work? Elasticsearch	8	2501	October 4, 2022
ES with Mongodb Elasticsearch	5	298	July 6, 2017
Official Support for UpdateByQuery? Elasticsearch	1	288	July 6, 2017
Updating Elastic search index along with DB Elasticsearch	4	553	July 6, 2017

Update

Related topics