How to ingest data in real time from oracle to Elasticsearch

Hi everyone, I am using a loop in scala to query an Oracle table every 10 second, since Oracle table get continuously insertion. I create a select request then I create n json string containing n line from oracle that I push into Elasticsearch. After that I create a delete request to erase the n line from Oracle table that I have inserted into ES. I developped a completely beginner approach. So can you suggest me a better approach to load in real time or micro batch data from Oracle to ES and delete from Oracle. I heard about logstach or SreamSets. Do you have any idea? Thanks

You can certainly use Logstash to mirror your Oracle database into Elasticsearch and it excels in doing that. But Logstash won't delete anything from your database, it's capable only of doing one way sync (i.e. read from database and index into ES). Still, I can think of two workarounds for this issue that's worth trying (and requires extensive testing):

  • Logstash could call a stored procedure which selects and deletes data. The issue with this is that the whole process is not transactional, so if Logstash fails then you lose data (maybe this issue can be made a little bit better if using Logstash Persistent Queue)
  • Use a plain SQL query to index into Elasticsearch. But then have another external process that reads whatever is indexed in Elasticsearch and deletes from Oracle (you could use X-Pack Watcher here, but requires some plumbing)

Lastly, if you need to delete from Oracle, this somewhat resembles a queue based architecture. If it's the case, then using a queue server like Kafka, Redis, etc is more appropriate than using a database such as Oracle (Logstash can easily consume from those queue servers as well)

Regarding StreamSets, I am not aware of it, sorry.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.