Connection error between apache-hive and elasticsearch


(Yash M.) #1

Hello Folks,

I'm trying to integrate apache-hive with elasticsearch. In order to achieve same i am following below steps

Created Table in hive:

CREATE TABLE scadapass (VENDOR string, Device string, Default_Password string, Port string, Device_Type string, Protocol string, Source string) row format delimited fields terminated by ',' stored as textfile;

loaded data from csv file

LOAD DATA LOCAL INPATH '/home/elk/Desktop/scadapass.csv' OVERWRITE INTO TABLE scadapass;

Added jar using hiveCLI

add jar /home/elk/Desktop/elasticsearch-hadoop-hive-6.1.3.jar

Now when i'm trying to create external table

CREATE EXTERNAL TABLE scadapass_es (VENDOR string, Device string, Default_Password string, Port string, Device_Type string, Protocol string, Source string) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'scadapass/pass');

code is through following error

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Please provide any help in similar direction.

Thanks,
Yash


(James Baiera) #2

Are you specifying the list of ES nodes to connect to?


(Yash M.) #3

where i can specify the list of ES nodes in this process ? i'm assuming there is a jar file which i have to add in my class path using hiveCLI prompt before running the creation of external table query. and i also assuming this loaded jar file will allow me to establish a connection between hive and elastic so i can easily create table directly form hiveCLI.

if there is any changes required then let me know what sort of configuration i have to change. if you have any sample then please share the approach with us.


(James Baiera) #4

even though the jar will provide the code to connect to Elasticsearch, you still need to use es.nodes to tell it where your Elasticsearch cluster is hosted. I don't see you setting es.nodes anywhere in your examples.


(Yash M.) #5

Correct, so where I have to specify es.nodes settings in hive query or somewhere else?


(James Baiera) #6

you would specify it as part of the table properties. You can find more information about configuring the Hive integration here with additional information about configuration options here.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.