Hive real time or near real time sync with ES

Dear All,
Have a scenario, I am successfully able to bring Hive table data to ES. At the same time when it comes to sync latest updated row data from same Hive table then there is a manual work required to get the updated table row data in Hive to bring it in ES under same index.
So is there any settings or parameters that we need to add/modify that will constantly look/watch for changes happening in Hive table and sync those with ES without manual intervention? Because when there are several processing/algorithms run on Big Data it's hard to manually keep track of updated data in Hive DB/Tables

Any pointers will be helpful

Hi @bharat1.

which tools you are using for fetching the data from hive to ES.

are you using logstash or hadoop-elastic jars file?


Using hadoop-elastic jars

hI @bharat1.

if you are using hadoop_elastic jars then you have to create staging table and that staging table get new data from your another temp and you have to create another table that will point to elastic search index directly ....from staging table you have to use some sheduler to move data into pointed index table.

did you try with logstash?



Unfortunately, I'm not aware of any API's within Hive that would allow us to sync data between tables. It's important to remember that the Hive integration is exposed as a table, so the problem of syncing data between a Hive native table and the external ES-backed table is the same problem you might face when syncing two Hive native tables that have differing storage locations. Simply put- you'll need to either create some sort of tool that regularly exports data from one table to the other, or add some sort of ingestion logic that splits writes between the two tables.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.