Yes, that certainly make sense. The difficulty of handling this revolves
around the distributed nature (more specifically, replication) of
operations. There are different ways to implement it:
Have the update js function run on the primary shard, and then batch
changes using a similar mechanism to the batch API. This will means
replicating the data to the replicas. This is simpler to implement, though
the update is not atomic or blocking (other index operations on the same
data might "get in").
This does require to refresh the relevant shard(s) before execution in
order to see the latest data.
Have the update function happen on the primary and the replicas. This is
more efficient when it comes to not needing to transfer the data to the
replicas, but the query will be executed on all replicas, and its much
harder to maintain consistency of shard and its replicas in this case (this
must be maintained of course).
aparo has been talking about it as well (on IRC), and even went ahead and
implemented a proof of concept code.
On Thu, Nov 11, 2010 at 8:04 PM, Mooky firstname.lastname@example.org wrote:
Any thoughts on supporting an "update" feature in ES?
We have a need to update a quantity of documents - and rather than
rebuild the ES document and re-index, we'd rather read the data from
ES (since it has all the data), modify & reindex. (will be much
snappier, as re-assembling the ES document is a bit costly for us) .
What would be one step better is calling update with a query/filter &
pass, say, a js function to do our update and have ES execute it & re-
index all under the hood. That way we avoid having to write a bunch of
scrolling through large results sets, batching indexing operations and
we avoid shifting all the data back to the client and then back to the