Dear All,
Have a scenario, I am successfully able to bring Hive table data to ES. At the same time when it comes to sync latest updated row data from same Hive table then there is a manual work required to get the updated table row data in Hive to bring it in ES under same index.
So is there any settings or parameters that we need to add/modify that will constantly look/watch for changes happening in Hive table and sync those with ES without manual intervention? Because when there are several processing/algorithms run on Big Data it's hard to manually keep track of updated data in Hive DB/Tables
if you are using hadoop_elastic jars then you have to create staging table and that staging table get new data from your another temp and you have to create another table that will point to elastic search index directly ....from staging table you have to use some sheduler to move data into pointed index table.
Unfortunately, I'm not aware of any API's within Hive that would allow us to sync data between tables. It's important to remember that the Hive integration is exposed as a table, so the problem of syncing data between a Hive native table and the external ES-backed table is the same problem you might face when syncing two Hive native tables that have differing storage locations. Simply put- you'll need to either create some sort of tool that regularly exports data from one table to the other, or add some sort of ingestion logic that splits writes between the two tables.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.