Unable to connect through hadoop

firewater · January 8, 2016, 10:22pm

My pig code:

register /PATH/elasticsearch-hadoop-2.1.2.jar
A = LOAD 'logstash-2016.01.08.18/logs' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://my-url-here.net:9200');

I get the below error saying:

ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.xx.xx.xx:9200] failed (Connection timed out); selected next node [http://my-url-here.net:9200]
2016-01-08 19:29:51,350 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out

Does anyone know what's going on? For some reason, it's unable to connect to the ES node. I can do a curl on the server just fine.

costin · January 11, 2016, 4:49pm

You are hitting a bug caused by ES-Hadoop not translating properly hostnames to IPs. Try 2.2-rc1 instead.

Cheers,

firewater · January 11, 2016, 6:35pm

Hi costin, thanks for the reply. I tried using the 2.2-rc1, and now I'm getting:

ERROR 2118: Cannot resolve ip for hostname: http://my_url_here

Would changing the hostname to an IP work?

costin · January 11, 2016, 7:18pm

Can you post the error that you are getting - sounds like a bug.

firewater · January 11, 2016, 7:20pm

Yep, here it is:

2016-01-11 18:29:41,285 [JobControl] INFO org.elasticsearch.hadoop.util.Version - Elasticsearch Hadoop v2.2.0-rc1 [ab852c0eb0]
2016-01-11 18:29:41,447 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13201
2016-01-11 18:29:41,454 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
2016-01-11 18:29:41,455 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)

costin · January 11, 2016, 7:26pm

Can you try specifying the address without http://. I wonder if it makes any difference. And yes, changing the hostname to an IP should work however as long as the hostname is resolvable this shouldn't be the case.

firewater · January 11, 2016, 7:42pm

Looks like removing the http works, and it's trying to connect to the correct node now. However, still unable to get a connection with hadoop. I just checked that I can get a response from the same server through curl.

2016-01-11 19:27:06,220 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-01-11 19:28:09,480 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:28:09,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:31:18,482 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); selected next node [54.x.x.x:9200]
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:35:30,563 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); no other nodes left - aborting...
2016-01-11 19:35:30,564 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13208
2016-01-11 19:35:30,570 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]
2016-01-11 19:35:30,570 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]

costin · January 11, 2016, 8:15pm

It looks you have a hosted ES setup - namely using a routable IP (54.x.x) but afterwards the connects are made to private IPs (10.x.x.x).
Likely you have a wan/cloud setup - can you try that setup?

firewater · January 11, 2016, 8:39pm

Thanks so much for the help, Costin. set 'es.nodes.wan.only=true' and it's working now!

costin · January 11, 2016, 11:31pm

The hostname issue was fixed in master through

Will be picked up by the next nightly build.

Topic		Replies	Views
Unable to connect to ES Cluster on AWS Elasticsearch Service through Hadoop Elasticsearch es-hadoop	15	10046	July 6, 2017
[Hadoop][Pig] Timeout issues indexing data Elasticsearch	2	782	July 6, 2017
Not able to write data to ES using pig script - Connection error Elasticsearch es-hadoop	2	1050	July 6, 2017
Connection time out using ES -Hadoop Elasticsearch	1	613	July 5, 2017
Unable to connect remotely to elastic search 6.3.2 Elasticsearch	1	454	June 17, 2019

Unable to connect through hadoop

Related topics