Truncate or update Elastic Hive tables

yuan0122 · April 27, 2016, 1:03am

Hi, all. I am using Elasticsearch Hive integration, so that I can query from Hadoop tables, sending alerts when data is bad (with ElastAlert), as well as display on Kibana.

This is how I created the Elastic table:

CREATE EXTERNAL TABLE my_elastic_table (
timestamp BIGINT,
count BIGINT
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'my_index/my_type);

And I inserted into Elastic Hive table with:

INSERT OVERWRITE TABLE my_elastic_table
SELECT {something} FROM my_hadoop_table;

However, it didn't really > OVERWRITE elastic_table, it actually appended to elastic_table. So I tried to TRUNCATE elastic_table, and it gave me the following error:

FAILED: SemanticException [Error 10146]: Cannot truncate non-managed table elastic_table.

So I am asking if anyone know how to truncate, update or overwrite Elastic Hive tables. Or is there any better way to deal with this kind of problem. Thank you!

costin · April 30, 2016, 2:10pm

If you want to delete data, it's best to do it directly in Elasticsearch.
TRUNCATE and other delete operations are not supported in Hive since manly they assume files on HDFS (or the so-called managed tables`). Which is not the case with ES.

Deleting content is ES is quite easy - it can be a simple curl call for a given index.

yuan0122 · May 2, 2016, 6:12pm

Hi Costin, thanks for your reply.

Instead of delete, is UPDATE supported? From Apache Hive integration Documentation. I see insert overwrite hive tables. However, as I tested a lot, it didn't really OVERWRITE the tables. Can I ask the reason for using insert overwrite in the documentation? Thanks a lot!

costin · May 10, 2016, 7:59am

INSERT OVERWRITE is a bit of a misnomer. Due to the way Hive works OVERWRITE simply tells it to not pay attention to the table metadata (if its exists). Internally however the data is not updated; further more as a Hive adaptor, ES-Hadoop is unaware of whether INSERT or INSERT OVERWRITE was being used; Hive infrastructure doesn't provide any information on this front.

As a comparison Spark SQL for example does support different update modes (SaveModes as they are called).
Due to API limitations, one is best in accessing ES directly when doing data management and use Hive for reads and writes.

Hope this helps,

costin · May 10, 2016, 8:19am

By the way, if you want to do updates see es.write.operation parameter which instructs ES-Hadoop to update data in ES instead of just indexing it as mentioned here.

Topic		Replies	Views
ES-Hadoop overwrite the external table Elasticsearch es-hadoop	5	1459	November 1, 2017
Elasticsearch AND Hive table overwrite Action Elasticsearch	1	300	August 30, 2019
ES-Hadoop 2.0.2 jars and INSERT OVERWRITE Elasticsearch es-hadoop	4	1833	July 6, 2017
java.net.ConnectException when trying to overwrite a table in Hive? Elasticsearch es-hadoop	3	849	July 26, 2017
Insert Overwrite is not working through Hive-Elasticsearch Integration after multiple data import jobs through hive Elasticsearch es-hadoop	3	944	June 27, 2018

Truncate or update Elastic Hive tables

Related topics