Problem with retrieving data from Elasticsearch by Spark

(Yasmeen Chakrayapeta) #1

Hi Team,

I'm quite new to ES and ES-Hadoop. The code for pulling the data out from ES is like below:

es_read_conf = {
"es.nodes" : "",
"es.port" : "80",
"es.resource" : 'temprollover/rollover',
"es.input.json": "yes"
}

es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf)

I am trying to retrieve data from Elasticsearch and trying to convert into RDD and getting the below error, Can you please help
Traceback (most recent call last):
File "/home/hadoop/rdd-spark.py", line 21, in
conf=es_read_conf)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 751, in newAPIHadoopRDD
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:159)
at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:223)
at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:412)

(James Baiera) #2

The error message here points to an issue with your elasticsearch cluster: ES-Hadoop cannot find any nodes that are datanodes and also support communicating over http. I would double check your deployment to ensure that those nodes exist and are reachable from ES-Hadoop.

(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.