According to all I know, the external table which is just a link to the data stored in ES, and we can only insert appendant data or delete all of them. Is there any way to overwrite the data? such as just deleting some selected columns of it, rather than deleting all of them and load new data again.
I saw there is a similar topic, but it was asked in '15, here is the link:
so I am wondering that are there any changes made recently?
It depends on your use case. ES-Hadoop does not support performing delete operations for ES as delete operations are fairly costly to perform from a distributed computing environment (to delete the doc, you need to pull it's information, just to get its ID to remove it). What kind of use case are you looking to satisfy with these delete by column value access patterns? I'm wondering if there are better recipes to make your workload more Elasticsearch friendly.
Thank you for the reply!
Actually all I want to do is to replace the old data with the new data, which is overwrite, but all I can do now is to delete the old data by using the curl -XDELETE, then insert the data into ES table from hive. But actually it seems do not matter, but if there is better one, that definitely will be better.
If your data has unique id's for each record, you could instruct the ES-Hadoop connector to use the update/upsert operation when performing writes. In this case, when the job emits a record, Elasticsearch will track down the previous record data and replace it with the one being written. In the case of upsert, if the record does not exist, Elasticsearch will just create it instead of throwing an error.