Connecting hadoop and elasticsearch

Hi all,

I am new to Elasticsearch and kibana
I want establish two-way connection between hdfs and elasticsearch.
I used Hive external tables but its not working.
I am not finding a way ,
steps i have done
installed elasticsearch and kibana
for hadoop, i created hadoop cluster in google cloud and then i connected to terminal through ssh key. I download es-hadoop(https://www.elastic.co/downloads/hadoop). loaded data into hadoop and then created tables in hive. In hadoop, added elasticsearch-hadoop-hive-7.3.0.jar to class path and trying to create the external tables to create indexes in elasticsearch but couldn't do it. Is there any command to verify the connectivity between hadoop and elasticsearch.
Can anyone please suggest me which method is better and how to implement.

Thank you

Welcome to the fold, glad you're trying out the products!

Can you include the full error message from your attempts? Without it, we'd be pretty limited in how much we can help.

If you don't mind me asking, where do you have Elasticsearch running? If you're running Hadoop in a cloud environment you might need to configure the network to allow the Hadoop worker nodes to communicate with the Elasticsearch cluster.

Hi @james.baiera ,
I am using elasticsearch on local machine and hadoop-hive on cloud

> create external table ess_vgsale(
> rank int,
> name string,
> platform string,
> year int,
> genre string,
> publisher string,
> nasales int,
> eusales int,
> jpsales int,
> othersales int,
> globalsales int)
> stored by "org.elasticsearch.hadoop.hive.EsStorageHandler"
> tblproperties("es.resource"="ess_vgsale/vgsaleee",
> "es.index.auto.create"="true",
> "es.nodes.wan.only"="true",
> "es.nodes"="localhost")
> ;

OK
Time taken: 0.13 seconds

Instead of using Tez, I am using mapreduce

hive> set hive.execution.engine=mr;

I am trying to insert data from vgsale to ess_vgsale

hive>insert overwrite table ess_vgsale select * from vgsale limit 5;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

 Query ID = saivarunakuraju_20190823183022_dd8be189-22e7-4a9e-bf6b-8b2139a7f93b

 Total jobs = 1
 
 Launching Job 1 out of 1
 
 Number of reduce tasks determined at compile time: 1
 
 In order to change the average load for a reducer (in bytes):
 
 set hive.exec.reducers.bytes.per.reducer=
 
 In order to limit the maximum number of reducers:
 
 set hive.exec.reducers.max=
 
 In order to set a constant number of reducers:
 
 set mapreduce.job.reduces=
 
 org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
 
 at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
 
 at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:262)
 
 at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:253)
 
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1155)
 
 at org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
 
 at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:281)
 
 at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:145)
 
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
 
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
 
 at java.security.AccessController.doPrivileged(Native Method)
 
 at javax.security.auth.Subject.doAs(Subject.java:422)
 
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
 
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
 
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
 
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
 
 at java.security.AccessController.doPrivileged(Native Method)
 
 at javax.security.auth.Subject.doAs(Subject.java:422)
 
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
 
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
 
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
 
 at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:411)
 
 at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
 
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
 
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
 
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
 
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
 
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
 
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
 
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
 
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
 
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
 
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
 
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
 
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
 at java.lang.reflect.Method.invoke(Method.java:498)
 
 at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
 
 Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]
 
 at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:152)
 
 at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:424)
 
 at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
 
 at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
 
 at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
 
 at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
 
 at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
 
 ... 40 more
 
 Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only')'
 
 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only

Thank you