Unable to connect through hadoop


#1

My pig code:

register /PATH/elasticsearch-hadoop-2.1.2.jar
A = LOAD 'logstash-2016.01.08.18/logs' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://my-url-here.net:9200');

I get the below error saying:

ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.xx.xx.xx:9200] failed (Connection timed out); selected next node [http://my-url-here.net:9200]
2016-01-08 19:29:51,350 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out

Does anyone know what's going on? For some reason, it's unable to connect to the ES node. I can do a curl on the server just fine.


(Costin Leau) #2

You are hitting a bug caused by ES-Hadoop not translating properly hostnames to IPs. Try 2.2-rc1 instead.

Cheers,


#3

Hi costin, thanks for the reply. I tried using the 2.2-rc1, and now I'm getting:

ERROR 2118: Cannot resolve ip for hostname: http://my_url_here

Would changing the hostname to an IP work?


(Costin Leau) #4

Can you post the error that you are getting - sounds like a bug.


#5

Yep, here it is:

2016-01-11 18:29:41,285 [JobControl] INFO org.elasticsearch.hadoop.util.Version - Elasticsearch Hadoop v2.2.0-rc1 [ab852c0eb0]
2016-01-11 18:29:41,447 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13201
2016-01-11 18:29:41,454 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
2016-01-11 18:29:41,455 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)


(Costin Leau) #6

Can you try specifying the address without http://. I wonder if it makes any difference. And yes, changing the hostname to an IP should work however as long as the hostname is resolvable this shouldn't be the case.


#7

Looks like removing the http works, and it's trying to connect to the correct node now. However, still unable to get a connection with hadoop. I just checked that I can get a response from the same server through curl.

2016-01-11 19:27:06,220 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-01-11 19:28:09,480 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:28:09,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:31:18,482 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); selected next node [54.x.x.x:9200]
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:35:30,563 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); no other nodes left - aborting...
2016-01-11 19:35:30,564 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13208
2016-01-11 19:35:30,570 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]
2016-01-11 19:35:30,570 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]


(Costin Leau) #8

It looks you have a hosted ES setup - namely using a routable IP (54.x.x) but afterwards the connects are made to private IPs (10.x.x.x).
Likely you have a wan/cloud setup - can you try that setup?


#9

Thanks so much for the help, Costin. set 'es.nodes.wan.only=true' and it's working now!


(Costin Leau) #10

The hostname issue was fixed in master through

Will be picked up by the next nightly build.


(system) #11