Hive real time or near real time sync with ES

bharat1 · October 3, 2019, 7:16am

Dear All,
Have a scenario, I am successfully able to bring Hive table data to ES. At the same time when it comes to sync latest updated row data from same Hive table then there is a manual work required to get the updated table row data in Hive to bring it in ES under same index.
So is there any settings or parameters that we need to add/modify that will constantly look/watch for changes happening in Hive table and sync those with ES without manual intervention? Because when there are several processing/algorithms run on Big Data it's hard to manually keep track of updated data in Hive DB/Tables

Any pointers will be helpful

rameshkr1994 · October 3, 2019, 3:23pm

Hi @bharat1.

which tools you are using for fetching the data from hive to ES.

are you using logstash or hadoop-elastic jars file?

Thanks
HadoopHelp

bharat1 · October 3, 2019, 3:33pm

Using hadoop-elastic jars

rameshkr1994 · October 4, 2019, 2:29pm

hI @bharat1.

if you are using hadoop_elastic jars then you have to create staging table and that staging table get new data from your another temp and you have to create another table that will point to elastic search index directly ....from staging table you have to use some sheduler to move data into pointed index table.

did you try with logstash?

Thanks
HadoopHelp

james.baiera · October 16, 2019, 8:24pm

@bharat1

Unfortunately, I'm not aware of any API's within Hive that would allow us to sync data between tables. It's important to remember that the Hive integration is exposed as a table, so the problem of syncing data between a Hive native table and the external ES-backed table is the same problem you might face when syncing two Hive native tables that have differing storage locations. Simply put- you'll need to either create some sort of tool that regularly exports data from one table to the other, or add some sort of ingestion logic that splits writes between the two tables.

system · November 13, 2019, 8:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Insert data into Elasticsearch from Hive in real-time Elasticsearch es-hadoop	4	2284	July 6, 2017
Collect data from HIVE Elasticsearch es-hadoop	4	1415	August 22, 2018
Hive table with Elasticsearch Elasticsearch es-hadoop	2	372	January 5, 2021
How to push data from Hadoop to ES? Elasticsearch es-hadoop	6	4215	July 21, 2017
Hive external table automatically send data to elasticsearch Elasticsearch es-hadoop	2	857	July 6, 2017

Hive real time or near real time sync with ES

Related topics