Update api & consistency


(revdev-2) #1

We have two processes, each running on different machines that operate on
an ES cluster. One process indexes a document using write consistency ALL
and then makes the document available to the other process right away. The
other process performs some other work and then it needs to set a flag in
the document to make it searchable - this is done using the Update API. In
90-95% of the cases, this works fine but in some rare instances we are
seeing 404s on the /_update request. Though even rarer (< 1%), we see a 409
response code. I would've thought that write consistency ALL would make the
document available for the update API. Can anyone comment on this use case
or why we might be seeing these issues (or if this is a known issue)? We're
currently running 0.90.7 in the cluster and have 5 members that are
data/master true.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7de196d8-ee4e-4502-a326-c9831b32673d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Write consistency = ALL returns to the client after all nodes that hold
shards have responded, but that does not necessarily mean they are
available for "get" by other clients (update is using get operation to
retrieve the document).

If you want causality in your data writes and reads, you must use the
refresh operation (with all the negative impacts on performance).

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%2BBoKqNJS66OFSFnzE8zuMeq3251iMcusT9ttE8cL6pQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #3

Write consistency in elasticsearch is quite different to what has been
described here. You can think of it as a check done before indexing, to
make sure that enough copies of the data are available. It doesn't have
anything to do with when the response is returned and only affects whether
the write operation is accepted or not in the first place.

On the other hand, the replication (sync by default) affects when the
response is returned: by default after all copies have the document
indexed, only after the primary otherwise (async).

Using write consistency in the described usecase doesn't make a difference
in terms of making the document available for get, used internally by the
update API. In fact, the get API works in real-time, as it can retrieve the
documents from the transaction log if a refresh has not happened yet in the
lucene index. That means that when the index operation returns, using the
default sync replication, all copies (primary + replicas) have the document
at least in the transaction log, which can be retrieved using the get API,
no refresh needed.

Interesting that in some cases you got 404, that sounds weird, are you sure
the update happens after the previous operation returned?

Also, using the update API the document is always retrieved from the
primary shard, modified and reindexed there (+ replication afterwards),
thus you could potentially set the replication to async without running
into problems when it comes to updating the document using the update API.

As for the 409, that's a version conflict, meaning that there probably are
multiple updates done to the same document, you might want to tweak
retry_on_conflict to allow for retries when a conflict is found.

On Tuesday, January 21, 2014 11:34:06 PM UTC+1, Jörg Prante wrote:

Write consistency = ALL returns to the client after all nodes that hold
shards have responded, but that does not necessarily mean they are
available for "get" by other clients (update is using get operation to
retrieve the document).

If you want causality in your data writes and reads, you must use the
refresh operation (with all the negative impacts on performance).

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b1f1b863-980f-4bcf-aa50-f92814f8bb71%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4