Transactional ACID features in ES

On Mon, Sep 12, 2011 at 3:55 PM, Per Steffensen steff@designware.dk wrote:

Extracted from http://groups.google.com/**group/elasticsearch/browse_**
thread/thread/cbd2cc71c407e435http://groups.google.com/group/elasticsearch/browse_thread/thread/cbd2cc71c407e435and modified to be more concrete

What kind of transactional ACID features does ES support? It would be nice
to have have all the ACID-properties in a transaction spanning the entire
work of an indexing-process (some code that I will write doing a number of
index-operations against ES). I will not bother you with the Isolation and
Durability aspects here. But I will bother you with the Atomicity and
Consistency aspects.

Atomicity. If I have an indexing-process (doing bulk (
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/guide/reference/java-api/bulk.html)
indexing of many documents) do I get the Atomicity feature of ACID
transactions? Put in another way - do I end in a state where "all the
documents or non of the documents have been indexed", when I call "execute"?
I guess not, since the example on http://www.elasticsearch.org/**
guide/reference/java-api/bulk.**htmlhttp://www.elasticsearch.org/guide/reference/java-api/bulk.htmlhas a comment "process failures by iterating through each bulk response
item", indicating that I will have detailed information back about which
documents where successfully indexed and which where not. Is that correct?

Yes. Atomicity is per document.

Consistency. I know at least 3 features in ES that will require speciel
attention in the ES-implementation in order to also work when working with
documents concurrently from many processes:
a) Making sure that there are never a violation of the unique constraint on
type/_id of documents in an index. Will the unique constaint implementation
on type/_id work correctly if many concurrent processes try to index new
documents with the same values on type and _id? Also if the different
processes use routing, so that the new documents with the same values on
type and _id, are actually not routed to the same shard (and therefore
potentially not the same node)? How well has this been tested?

It handles concurrent updates.

b) Making sure that the "optimistic locking" (
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/blog/2011/02/08/versioning.html)
implemented around updating (re-indexing) of documents works. Will the
"optimistic locking" work correctly if many concurrent processes try to
update an existing document concurrently? Put in another way, it is
guranteed, if 100 processes in the same split-sec tries to update an
existing document, that one and only one of those processes will succeed and
the other 99 processes will fail (with HTTP error code 409). How well has
this been tested?

Yes.

c) Same as b) above, but with deleting instead of updating (re-indexing)
and with HTTP error code 404 instead of 409.

Yes.

Regards, Per Steffensen