Error When Writing to S3 from Hive External Table Over ElasticSearch


#1

We have set up an external table over elasticsearch using the following method: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html

It works great. We are table to query our indexes in Hive.

However, we want to write the results of these queries to an S3 external table. When we try this, we run across: "org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'"

We have specified 'es.nodes.wan.only'='true' in our create table statement. And, we only get this error when writing the results of a query to s3. Any ideas what we're doing wrong?

Any help is much appreciated.

DROP TABLE IF EXISTS exampletable1;
CREATE EXTERNAL TABLE exampletable1 (
example_field STRING,
example_field2 TIMESTAMP ,
example_field3 STRUCT<example_field4:STRING, example_field5:STRING>,
example_field6 DOUBLE
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'exampleindex/examplemapping','es.nodes'='url:AWS PORT','es.nodes.wan.only' = 'true');

DROP TABLE IF EXISTS exampletable2;
CREATE EXTERNAL TABLE exampletable2 (
example_field STRING,
example_field2 TIMESTAMP,
example_field3 STRUCT<example_field4:STRING, example_field5:STRING>,
example_field6 DOUBLE
)
LOCATION 'example s3 location';

INSERT OVERWRITE TABLE exampletable2
SELECT * FROM exampletable1;


#2

We found that it was a permissions issue on our side. We had only given access to our elasticsearch cluster to the master node of our EMR cluster. All nodes in the EMR cluster need access.


(system) #3