External Versioning Enhancement need your input!

Simon,

It's not about timestamp vs. numeric version it's about how to do it for
complex concurrently updated graph of objects where any part of it can
change and ensure that denormalized version of the graph sent to ES is not
a stale version. One approach is that every application doing push of
denormalized objects, will track and roll up changes to any part of the
graph to its top entity tracking them using numeric version or a timestamp.
It is fairly labor intensive and actually not trivial in highly concurrent
environment if you use ORM technologies (JPA, JDO) without introducing
concurrency issues (use optimistic locking technique is not an option for
big graph as it will introduce huge concurrency problems). Another approach
is to version of every object of the graph sent over to ES as array of
their versions. Which is very simple because the versions are already
present in individual entities and managed as part of optimistic locking
for those individual objects.

So while support for an array of versions require some work on ES, it
actually logically very similar to using single version value (which of
course still be default option)

I have to admit I have not thought through my approach in details - I hoped
to get some discussion going and see what people do in cases like this.
I see some issues with it such as versions of untouched entities in the
graph pushed to ES will not necessarily be the latest without causing any
optimistic locking exception. Say two users read the same purchase order,
one updated one line item and the other updated the other. Unless we
control concurrency on PO level denying one user his update they will be
both successful and if they both push to ES at the same time ES will have
inconsistent data. Unfortunately escalating concurrency control to the very
top of the graph is not an option in most of the cases as it will cause
very high level of unwarranted (by business logic) contentions on that
single lock (or version indicator).

I guess it may be that, fundamentally, a consistent push of an object
graph is not possible unless you enforce concurrency for entire graph as a
whole which I do not think is acceptable in a transactional system where
many users updates parts of the graph.

So maybe I will have to resort to a hybrid pull/push approach where my app
servers will post only IDs of modified objects (or rather their top level
owner's IDs) and indexer will pull all IDs get collapse any redundancies
and pull the latest data into the index in an optimal way. The downside is
that pulling the graph from DB on indexer side means doubling database
load duplicating all the reads and also potentially more latency than with
push

Any input or suggestions would be very welcome...

Alex

On Saturday, February 2, 2013 3:23:12 PM UTC-5, simonw wrote:

Hey,

I personally think that moving away from a numerical version is not
trivial to begin with and I don't think it gives us a real gain in
functionality. I personally used a timestamp as the external version if I
had similar problems you have and that works very well. Do you think this
could help you too?

simon

On Friday, February 1, 2013 9:39:55 PM UTC+1, AlexR wrote:

Feedback anyone?
Is it important to anyone who does push from database backed app with
multiple app servers to avoid out of sequence updates issues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.