Typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

I hosted ElasticSearch on Azure and connecting to the Hive to load data into elastic search.

I am doing steps as follows

add jar adl://stepoc.azuredatalakestore.net/ES-Hadoop-Jars/elasticsearch-hadoop-6.3.0.jar;
add jar adl://stepoc.azuredatalakestore.net/ES-Hadoop-Jars/elasticsearch-hadoop-hive-6.3.0.jar;

CREATE EXTERNAL TABLE IF NOT EXISTS Reliance_Processed_ES
(
formId INT,
formName STRING,
updatedAt STRING
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='reliance/data', 'es.nodes'='X.X.X.X', 'es.port'='9200', 'es.nodes.wan.only'='true');

I am getting following error.

java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Note:
Firewalls are turned off.

First, make sure that ONLY ONE of those jars are added. The first one there includes all integrations, and the second one includes only the hive integration code + the core code. Adding both will trip up internal checks for only one jar on the classpath.

Secondly, this exception is a fairly high level exception. Can you include the exception text from the Hive server logs?

I have attached the log file in the following link, this log is after adding only one required hive jar.

Link for Log

updated queries which I am running are below.

add jar adl://stepoc.azuredatalakestore.net/ES-Hadoop-Jars/elasticsearch-hadoop-hive-6.3.0.jar;
CREATE EXTERNAL TABLE IF NOT EXISTS Reliance_Processed_ES
(
formId INT,
formName STRING,
updatedAt STRING
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='reliance/data', 'es.nodes'='x.x.x.x', 'es.port'='9200', 'es.nodes.wan.only'='true','es.net.http.auth.user'='sampleuser','es.net.http.auth.pass'='samplepassword','es.nodes.discovery'='true');

INSERT OVERWRITE TABLE Reliance_Processed_ES select * from Reliance_Processed;

Note:
#Not even able to access local instance of Elasticsearch.
#Firewalls are turned off too.

This seems like an issue with how your cloud network is configured, especially if it cannot even communicate with the local Elasticsearch instance. HiveServer should produce more logs as to why the operation has failed, but based on these logs alone, there is not enough information to identify what is wrong. I would manually check to make sure that HiveServer as well as every executor node is able to reach your Elasticsearch nodes at their published HTTP addresses, as detailed in their node info output curl -X GET localhost:9200/_nodes/http

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.