Unable to connect through hadoop

My pig code:

register /PATH/elasticsearch-hadoop-2.1.2.jar
A = LOAD 'logstash-2016.01.08.18/logs' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://my-url-here.net:9200');

I get the below error saying:

ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.xx.xx.xx:9200] failed (Connection timed out); selected next node [http://my-url-here.net:9200]
2016-01-08 19:29:51,350 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out

Does anyone know what's going on? For some reason, it's unable to connect to the ES node. I can do a curl on the server just fine.

You are hitting a bug caused by ES-Hadoop not translating properly hostnames to IPs. Try 2.2-rc1 instead.

Cheers,

Hi costin, thanks for the reply. I tried using the 2.2-rc1, and now I'm getting:

ERROR 2118: Cannot resolve ip for hostname: http://my_url_here

Would changing the hostname to an IP work?

Can you post the error that you are getting - sounds like a bug.

Yep, here it is:

2016-01-11 18:29:41,285 [JobControl] INFO org.elasticsearch.hadoop.util.Version - Elasticsearch Hadoop v2.2.0-rc1 [ab852c0eb0]
2016-01-11 18:29:41,447 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13201
2016-01-11 18:29:41,454 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
2016-01-11 18:29:41,455 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)

Can you try specifying the address without http://. I wonder if it makes any difference. And yes, changing the hostname to an IP should work however as long as the hostname is resolvable this shouldn't be the case.

Looks like removing the http works, and it's trying to connect to the correct node now. However, still unable to get a connection with hadoop. I just checked that I can get a response from the same server through curl.

2016-01-11 19:27:06,220 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-01-11 19:28:09,480 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:28:09,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:31:18,482 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); selected next node [54.x.x.x:9200]
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:35:30,563 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); no other nodes left - aborting...
2016-01-11 19:35:30,564 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13208
2016-01-11 19:35:30,570 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]
2016-01-11 19:35:30,570 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]

It looks you have a hosted ES setup - namely using a routable IP (54.x.x) but afterwards the connects are made to private IPs (10.x.x.x).
Likely you have a wan/cloud setup - can you try that setup?

Thanks so much for the help, Costin. set 'es.nodes.wan.only=true' and it's working now!

1 Like

The hostname issue was fixed in master through

Will be picked up by the next nightly build.