register /PATH/elasticsearch-hadoop-2.1.2.jar
A = LOAD 'logstash-2016.01.08.18/logs' USING org.elasticsearch.hadoop.pig.EsStorage('es.nodes=http://my-url-here.net:9200');
I get the below error saying:
ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.xx.xx.xx:9200] failed (Connection timed out); selected next node [http://my-url-here.net:9200]
2016-01-08 19:29:51,350 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
Does anyone know what's going on? For some reason, it's unable to connect to the ES node. I can do a curl on the server just fine.
2016-01-11 18:29:41,285 [JobControl] INFO org.elasticsearch.hadoop.util.Version - Elasticsearch Hadoop v2.2.0-rc1 [ab852c0eb0]
2016-01-11 18:29:41,447 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13201
2016-01-11 18:29:41,454 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
2016-01-11 18:29:41,455 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Cannot resolve ip for hostname: http://my_url_here
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Can you try specifying the address without http://. I wonder if it makes any difference. And yes, changing the hostname to an IP should work however as long as the hostname is resolvable this shouldn't be the case.
Looks like removing the http works, and it's trying to connect to the correct node now. However, still unable to get a connection with hadoop. I just checked that I can get a response from the same server through curl.
2016-01-11 19:27:06,220 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-01-11 19:28:09,480 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:28:09,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:29:12,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:30:15,481 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:31:18,482 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); selected next node [54.x.x.x:9200]
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:32:21,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:33:24,554 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection timed out
2016-01-11 19:34:27,559 [JobControl] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
2016-01-11 19:35:30,563 [JobControl] ERROR org.elasticsearch.hadoop.rest.NetworkClient - Node [10.45.56.56:9200] failed (Connection timed out); no other nodes left - aborting...
2016-01-11 19:35:30,564 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/etl/.staging/job_1448985836863_13208
2016-01-11 19:35:30,570 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:etl (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]
2016-01-11 19:35:30,570 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.x.x.x:9200]]
It looks you have a hosted ES setup - namely using a routable IP (54.x.x) but afterwards the connects are made to private IPs (10.x.x.x).
Likely you have a wan/cloud setup - can you try that setup?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.