Connection error between apache-hive and elasticsearch

Hello Folks,

I'm trying to integrate apache-hive with elasticsearch. In order to achieve same i am following below steps

Created Table in hive:

CREATE TABLE scadapass (VENDOR string, Device string, Default_Password string, Port string, Device_Type string, Protocol string, Source string) row format delimited fields terminated by ',' stored as textfile;

loaded data from csv file

LOAD DATA LOCAL INPATH '/home/elk/Desktop/scadapass.csv' OVERWRITE INTO TABLE scadapass;

Added jar using hiveCLI

add jar /home/elk/Desktop/elasticsearch-hadoop-hive-6.1.3.jar

Now when i'm trying to create external table

CREATE EXTERNAL TABLE scadapass_es (VENDOR string, Device string, Default_Password string, Port string, Device_Type string, Protocol string, Source string) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'scadapass/pass');

code is through following error

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Please provide any help in similar direction.

Thanks,
Yash

Are you specifying the list of ES nodes to connect to?

where i can specify the list of ES nodes in this process ? i'm assuming there is a jar file which i have to add in my class path using hiveCLI prompt before running the creation of external table query. and i also assuming this loaded jar file will allow me to establish a connection between hive and elastic so i can easily create table directly form hiveCLI.

if there is any changes required then let me know what sort of configuration i have to change. if you have any sample then please share the approach with us.

even though the jar will provide the code to connect to Elasticsearch, you still need to use es.nodes to tell it where your Elasticsearch cluster is hosted. I don't see you setting es.nodes anywhere in your examples.

Correct, so where I have to specify es.nodes settings in hive query or somewhere else?

you would specify it as part of the table properties. You can find more information about configuring the Hive integration here with additional information about configuration options here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.