Hi, all. I am using Elasticsearch Hive integration, so that I can query from Hadoop tables, sending alerts when data is bad (with ElastAlert), as well as display on Kibana.
INSERT OVERWRITE TABLE my_elastic_table
SELECT {something} FROM my_hadoop_table;
However, it didn't really > OVERWRITE elastic_table, it actually appended to elastic_table. So I tried to TRUNCATE elastic_table, and it gave me the following error:
So I am asking if anyone know how to truncate, update or overwrite Elastic Hive tables. Or is there any better way to deal with this kind of problem. Thank you!
If you want to delete data, it's best to do it directly in Elasticsearch. TRUNCATE and other delete operations are not supported in Hive since manly they assume files on HDFS (or the so-called managed tables`). Which is not the case with ES.
Deleting content is ES is quite easy - it can be a simple curl call for a given index.
Instead of delete, is UPDATE supported? From Apache Hive integration Documentation. I see insert overwrite hive tables. However, as I tested a lot, it didn't really OVERWRITE the tables. Can I ask the reason for using insert overwrite in the documentation? Thanks a lot!
INSERT OVERWRITE is a bit of a misnomer. Due to the way Hive works OVERWRITE simply tells it to not pay attention to the table metadata (if its exists). Internally however the data is not updated; further more as a Hive adaptor, ES-Hadoop is unaware of whether INSERT or INSERT OVERWRITE was being used; Hive infrastructure doesn't provide any information on this front.
As a comparison Spark SQL for example does support different update modes (SaveModes as they are called).
Due to API limitations, one is best in accessing ES directly when doing data management and use Hive for reads and writes.
By the way, if you want to do updates see es.write.operation parameter which instructs ES-Hadoop to update data in ES instead of just indexing it as mentioned here.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.