ES-Hadoop overwrite the external table

EvelynC · September 22, 2017, 9:02pm

Hi everyone,

According to all I know, the external table which is just a link to the data stored in ES, and we can only insert appendant data or delete all of them. Is there any way to overwrite the data? such as just deleting some selected columns of it, rather than deleting all of them and load new data again.
I saw there is a similar topic, but it was asked in '15, here is the link:

so I am wondering that are there any changes made recently?

Thanks!

james.baiera · October 4, 2017, 3:45am

It depends on your use case. ES-Hadoop does not support performing delete operations for ES as delete operations are fairly costly to perform from a distributed computing environment (to delete the doc, you need to pull it's information, just to get its ID to remove it). What kind of use case are you looking to satisfy with these delete by column value access patterns? I'm wondering if there are better recipes to make your workload more Elasticsearch friendly.

EvelynC · October 4, 2017, 5:54pm

Thank you for the reply!
Actually all I want to do is to replace the old data with the new data, which is overwrite, but all I can do now is to delete the old data by using the curl -XDELETE, then insert the data into ES table from hive. But actually it seems do not matter, but if there is better one, that definitely will be better.
Thanks again!

james.baiera · October 4, 2017, 6:09pm

If your data has unique id's for each record, you could instruct the ES-Hadoop connector to use the update/upsert operation when performing writes. In this case, when the job emits a record, Elasticsearch will track down the previous record data and replace it with the one being written. In the case of upsert, if the record does not exist, Elasticsearch will just create it instead of throwing an error.

EvelynC · October 4, 2017, 6:21pm

Thank you!
I will try this!

system · November 1, 2017, 6:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES-Hadoop 2.0.2 jars and INSERT OVERWRITE Elasticsearch es-hadoop	4	1833	July 6, 2017
Truncate or update Elastic Hive tables Elasticsearch es-hadoop	5	3918	July 6, 2017
Elasticsearch AND Hive table overwrite Action Elasticsearch	1	300	August 30, 2019
Insert Overwrite is not working through Hive-Elasticsearch Integration after multiple data import jobs through hive Elasticsearch es-hadoop	3	944	June 27, 2018
Elastic Search Does not overwrite data from Hive overwrite insert statement Elasticsearch	1	470	April 10, 2018

ES-Hadoop overwrite the external table

Related topics