Hi,
I am facing issue while running hive map/red jobs using es-hadoop connector with error:
Vertex failed, vertexName=Map 1, vertexId=vertex_1499809462083_0021_1_00, diagnostics=[Vertex vertex_1499809462083_0021_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: inboundrecordsesdatasnapshottable initializer failed, vertex=vertex_1499809462083_0021_1_00 [Map 1], org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
I am trying to use es-hadoop connector to run some hive queries against my ES data.
- [ES-2.3.3]
- [EMR: emr-5.6.0]
- [ES-Hadoop: 5.4.1]
Following are the steps I followed.
- I created EMR cluster (aws emr-5.6.0) on a 3 node cluster.
- Installed ES 2.3.2 on the master node, and started a single node ES cluster successfully.
- I am able to request data successfully by 'curl localhost:9200 ...'
- Now I add es-hadoop jar in hive, and map ES data to hive table.. the corresponding hive query to map ES data to hive running successfully.
The select * from my-table
query runs perfectly.
Now when I run select count(*) from my-table
I see the org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException
Full stack trace:
Vertex failed, vertexName=Map 1, vertexId=vertex_1499809462083_0022_1_00, diagnostics=[Vertex vertex_1499809462083_0022_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: inboundrecordsesdatasnapshottable initializer failed, vertex=vertex_1499809462083_0022_1_00 [Map 1], org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:469)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:547)
at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexStatus(InitializationUtils.java:71)
at org.elasticsearch.hadoop.rest.InitializationUtils.validateSettingsForReading(InitializationUtils.java:260)
at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:217)
at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:405)
at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:114)
at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:50)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:363)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:486)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:200)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1499809462083_0022_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1499809462083_0022_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1499809462083_0022_1_00, diagnostics=[Vertex vertex_1499809462083_0022_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: inboundrecordsesdatasnapshottable initializer failed, vertex=vertex_1499809462083_0022_1_00 [Map 1], org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
The cluster seems to be set up correctly as seen from following curl:
curl 127.0.0.1:9200/_nodes/http?pretty
{
"cluster_name" : "addhawan-test",
"nodes" : {
"wXWSYLccR6u7uouV8ua-Pw" : {
"name" : "Leonard Samson",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "2.3.3",
"build" : "218bdf1",
"http_address" : "127.0.0.1:9200",
"http" : {
"bound_address" : [ "[::1]:9200", "127.0.0.1:9200" ],
"publish_address" : "127.0.0.1:9200",
"max_content_length_in_bytes" : 104857600
}
}
}
}
I am not able to debug this error further. I have tried looking into hive logs, but nothing seems to be helpful.