Extracted from
http://groups.google.com/group/elasticsearch/browse_thread/thread/cbd2cc71c407e435
and modified to be more concrete
What kind of transactional ACID features does ES support? It would be
nice to have have all the ACID-properties in a transaction spanning the
entire work of an indexing-process (some code that I will write doing a
number of index-operations against ES). I will not bother you with the
Isolation and Durability aspects here. But I will bother you with the
Atomicity and Consistency aspects.
Atomicity. If I have an indexing-process (doing bulk
(http://www.elasticsearch.org/guide/reference/java-api/bulk.html)
indexing of many documents) do I get the Atomicity feature of ACID
transactions? Put in another way - do I end in a state where "all the
documents or non of the documents have been indexed", when I call
"execute"? I guess not, since the example on
http://www.elasticsearch.org/guide/reference/java-api/bulk.html has a
comment "process failures by iterating through each bulk response item",
indicating that I will have detailed information back about which
documents where successfully indexed and which where not. Is that correct?
Consistency. I know at least 3 features in ES that will require speciel
attention in the ES-implementation in order to also work when working
with documents concurrently from many processes:
a) Making sure that there are never a violation of the unique constraint
on type/_id of documents in an index. Will the unique constaint
implementation on type/_id work correctly if many concurrent processes
try to index new documents with the same values on type and _id? Also if
the different processes use routing, so that the new documents with the
same values on type and _id, are actually not routed to the same shard
(and therefore potentially not the same node)? How well has this been
tested?
b) Making sure that the "optimistic locking"
(http://www.elasticsearch.org/blog/2011/02/08/versioning.html)
implemented around updating (re-indexing) of documents works. Will the
"optimistic locking" work correctly if many concurrent processes try to
update an existing document concurrently? Put in another way, it is
guranteed, if 100 processes in the same split-sec tries to update an
existing document, that one and only one of those processes will succeed
and the other 99 processes will fail (with HTTP error code 409). How
well has this been tested?
c) Same as b) above, but with deleting instead of updating (re-indexing)
and with HTTP error code 404 instead of 409.
Regards, Per Steffensen