Connecting hadoop and elasticsearch

varunakuraju · August 21, 2019, 5:32pm

Hi all,

I am new to Elasticsearch and kibana
I want establish two-way connection between hdfs and elasticsearch.
I used Hive external tables but its not working.
I am not finding a way ,
steps i have done
installed elasticsearch and kibana
for hadoop, i created hadoop cluster in google cloud and then i connected to terminal through ssh key. I download es-hadoop(https://www.elastic.co/downloads/hadoop). loaded data into hadoop and then created tables in hive. In hadoop, added elasticsearch-hadoop-hive-7.3.0.jar to class path and trying to create the external tables to create indexes in elasticsearch but couldn't do it. Is there any command to verify the connectivity between hadoop and elasticsearch.
Can anyone please suggest me which method is better and how to implement.

Thank you

james.baiera · August 23, 2019, 4:59pm

Welcome to the fold, glad you're trying out the products!

Can you include the full error message from your attempts? Without it, we'd be pretty limited in how much we can help.

If you don't mind me asking, where do you have Elasticsearch running? If you're running Hadoop in a cloud environment you might need to configure the network to allow the Hadoop worker nodes to communicate with the Elasticsearch cluster.

varunakuraju · August 23, 2019, 7:45pm

Hi @james.baiera ,
I am using elasticsearch on local machine and hadoop-hive on cloud

> create external table ess_vgsale(
> rank int,
> name string,
> platform string,
> year int,
> genre string,
> publisher string,
> nasales int,
> eusales int,
> jpsales int,
> othersales int,
> globalsales int)
> stored by "org.elasticsearch.hadoop.hive.EsStorageHandler"
> tblproperties("es.resource"="ess_vgsale/vgsaleee",
> "es.index.auto.create"="true",
> "es.nodes.wan.only"="true",
> "es.nodes"="localhost")
> ;

OK
Time taken: 0.13 seconds

Instead of using Tez, I am using mapreduce

hive> set hive.execution.engine=mr;

I am trying to insert data from vgsale to ess_vgsale

hive>insert overwrite table ess_vgsale select * from vgsale limit 5;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

 Query ID = saivarunakuraju_20190823183022_dd8be189-22e7-4a9e-bf6b-8b2139a7f93b

 Total jobs = 1
 
 Launching Job 1 out of 1
 
 Number of reduce tasks determined at compile time: 1
 
 In order to change the average load for a reducer (in bytes):
 
 set hive.exec.reducers.bytes.per.reducer=
 
 In order to limit the maximum number of reducers:
 
 set hive.exec.reducers.max=
 
 In order to set a constant number of reducers:
 
 set mapreduce.job.reduces=
 
 org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
 
 at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
 
 at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:262)
 
 at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:253)
 
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1155)
 
 at org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
 
 at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:281)
 
 at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:145)
 
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
 
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
 
 at java.security.AccessController.doPrivileged(Native Method)
 
 at javax.security.auth.Subject.doAs(Subject.java:422)
 
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
 
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
 
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
 
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
 
 at java.security.AccessController.doPrivileged(Native Method)
 
 at javax.security.auth.Subject.doAs(Subject.java:422)
 
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
 
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
 
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
 
 at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:411)
 
 at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
 
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
 
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
 
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
 
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
 
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
 
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
 
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
 
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
 
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
 
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
 
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
 
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
 at java.lang.reflect.Method.invoke(Method.java:498)
 
 at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
 
 Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]
 
 at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:152)
 
 at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:424)
 
 at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
 
 at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
 
 at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
 
 at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
 
 at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
 
 ... 40 more
 
 Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only')'
 
 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only

Thank you

system · September 20, 2019, 7:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Connecting hdfs and elasticsearch Elasticsearch es-hadoop	11	1821	August 3, 2018
Hadoop and ES connectivity Elasticsearch es-hadoop	7	1325	August 31, 2018
Connection error between apache-hive and elasticsearch Elasticsearch es-hadoop	6	1864	March 13, 2018
Data Integration between Hadoop - Hive and Elastic Search Elasticsearch es-hadoop	3	830	February 10, 2022
Query HDFS data using ES and Kibana Elasticsearch es-hadoop	4	1483	May 7, 2019

Connecting hadoop and elasticsearch

Related topics