I would like to propose an enhancement to ES external versioning. Currently
external version is a Long number and larger number implies newer version.
It works great if your source is a single entity (say a simple persistent
object managed with JPA and we pass JPA managed entity version it uses for
optimistic locking to ES).
Not so well when you pass a graph of persistent objects to ES which happens
all the time since ES is all about de-normalizing the data. The problem is
that now each entity of the graph has its own version and in most cases
modification of the child entity content should only increment that
entity's version and not owning (or related) objects version. So for this
real world scenario the simple and elegant idea of using your source entity
versioning infrastructure does not work.
So here is the idea:
Support composite version indicator which is simply an array of versions of
source entities which made denormalized ES document. ES would not need to
understand it (it is data provider's responsibility to supply right array
of versions) just compare it from left to right to make sure each element
is larger.
There could be two main use cases
-
When parent object of the denormalized graph has 1-1 relationship with
all its parts (say Person has reference to HomeAddress and WorkAddress)
then version array is of fixed length: [personVersion, homeAddressVersion,
workAddressVersion] -
When there are any 1-N relationships (i.e. PurchaseOrder and its POLines
gets denormalized) in which case we will use nested array for PO lines
[purchaseOrderVersion, POLinesVersions[]]. The trick here is that parent
version should precede child versions array. So if parent versions are not
equal no need to continue. If they are equal, then both collections of
POLines of the two versions must be identical (any difference due to
collection add/remove operations should increment PurchaseOrder version)
and thus version arrays must have a) the same number.order of elements b)
newer PurchaseOrder lines will have larger POLines versions in the array)
Well the description is rather lengthy but ES algorithm is trivial -
compare array of integers or nested arrays of integers recursively - very
simple and fast!
Such approach will allow very robust external versioning for denormalized
object graphs based on versions of their parts using array of versions
concept. Using simple number based versioning will of course stay as well
An alternative to array could be a binary string with predefined length
per version but I think array is easier to deal with
What do you think? As for me it would dramatically simplify my life and
improve performance as I no longer need to intercept changes to versions of
all parts to increment synthetic version of entire graph (not to mention
the fact that when denormalization can be done in several ways one need to
maintain or calculate several synthetic versions for each graph
What do you think? Is it something worth proposing to ES dev team?
Thank you,
Alex
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.