Hi all,
I would to push data from Hive to table to ES Index. I have managed to do it with Hive using an external table _csv_demo_export__es (create external table csv_demo_export_es STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' ...)
Using Hive (beeline CLI) I am able to write into selected index through this external table (INSERT OVERWRITE TABLE csv_demo_export_es SELECT ...)
Now I would like to perform the same with Spark-SQL i.e. using pure SQL. with Spark-SQL CLI I am able to see the external table I have created with Hive and pointing to an ES Index. I can also read from the index using this table I have defined in Hive. But I cannot insert data into it as I could do with Hive
spark-sql (default)> INSERT OVERWRITE TABLE csv_demo_export_es SELECT event_id, host, collection_time FROM csv_demo_data WHERE year=2016;
17/01/04 10:35:47 ERROR SparkSQLDriver: Failed in [INSERT OVERWRITE TABLE csv_demo_export_es SELECT event_id, host, collection_time FROM csv_demo_data WHERE year=2016]
java.lang.RuntimeException: java.lang.RuntimeException: class org.elasticsearch.hadoop.mr.EsOutputFormat$EsOutputCommitter not org.apache.hadoop.mapred.OutputCommitter
I have also tried using a Spark SQL temporary table pointing to the same ES index (CREATE TEMPORARY TABLE myIndex USING org.elasticsearch.spark.sql )
I am able to perform read operations (select * friom myIndex) but not insert operations
Documentation tells about how to read indexes using pure SQL (https://www.elastic.co/guide/en/elasticsearch/hadoop/5.0/spark.html#spark-sql-read-ds) but not how to write into index using pure SQL (with Spark SQL... no pb with Hive)
Thanks in advance