I have searched the issues list but found nothing about my scenario. I use ES as a nosql db, so the data must be strictly same as data in relational database. i use java client and tcp mode to submit the data ES server, after that save the data in the oracle, all this code in a same method with spring transaction. Can this method guarantee data consistency?
BTW, i know the kafka has provide the send transaction with db data, does ES has a same plan?
What people are normally doing (ie Hibernate Search team - cc @Sanne ) is to use a Transactional MQ system in the middle. If the message can't be delivered to ES, it basically rollbacks the message, which rollbacks the transaction. (If I got it correctly).
I was doing something similar in the past. It was not ideal but at least was a good safeguard.
this is an interesting topic, with some ongoing research.
With Hibernate Search indeed we offer various options, depending on user requirements one might not want to have the "primary" (in our case the one involving RDBMS updates) transactions delayed or failed just because of some temporary connection issue to the Search service.
In its simplest form we don't do any queue (so not requiring a persistent messaging implementation), which implies changes to Elasticsearch are registered as a post-transaction event. This helps as Hibernate Search is quite mature so it knows how to keep the two worlds in synch, but indeed is not transactional - what we offer is a way to re-synch the whole database as an extreme form or recovery.
If you need transactions what we allow is to push the changeset into a transactional MQ; this decouples the two systems but ensures that the message is stored within the same transaction. The MQ system is normally fast and will then be able to send it off to the Search system shortly after (so humans shouldn't see any delay), or in case of failure of Elasticsearch to acknowledge the operation the queue can keep the messages stored safely and try again later.
How this is setup is up to the architects choice; for example one might want to store messages in MQ within the RDBMS transaction which is safer but requires XA, or enlisten it as a post-commit operation. For most queuest to be really reliable you'll need HA messaging, so this gets complex to setup but there's plenty of options.
An alternative we've been exploring, especially good if you're not using Hibernate ORM (as Hibernate Search requires it), is to use Debezium.
This is very nice as you can listen to the RDBMS transactions safely without requiring XA (nor Hibernate, so can listen to any change regardless of how it's performed) but there are limitations in how you can co-relate events spanning multiple tables in a consistent domain operation. This is most effective if you can adjust your table structure for this specific goal. See https://debezium.io/blog/2018/01/17/streaming-to-elasticsearch/
Of course, I also hope the ES can add similar func to java client. this will help me avoiding to setting up Kafka cluster. For import Kafka, the architecture become heavy and i must process the message consuming fail.
BTW, if i use ES java client and tcp connection to ES server, the connection disconnect event and other problem will throw Exception, will this be sure transaction?
Hi @douxf, Gunnar here from the Debezium team. In your code above Kafka won't be part of a joint transaction with the source database, so you may end up with inconsistent data in the index in case one of the two (unrelated) transactions is rolled back.
If you're using Kafka already, I'd definitely take a look into using Debezium to capture changes in your source database together with the kafka-connect-elasticsearch sink connector. As far as joining multiple table streams is concerned, you might find this post from our blog interesting, which discusses a possible approach for doing that using KStreams. We're also exploring further alternatives here, hope to blog about those soon.