Can you help to check this error please?

Kramer_Li · March 14, 2016, 11:24am

I `m trying to read from data ElasticSearch to spark ?

conf = {"es.resource":"sflow_*/sflow","es.nodes":"ES01","es.query":'some query'}

rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat",    "org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)

rdd.take(2)

After rdd.take(2) The process will stuck and issue the warn log like below

16/03/14 20:52:07 WARN httpclient.SimpleHttpConnectionManager: SimpleHttpConnectionManager being used
incorrectly.  Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or 
method is using this connection manager at a time.

But use rdd.first() will always return result successfully. Do you know why?

costin · March 21, 2016, 10:13am

Looks like a bug. EsInputFormat or any InputFormat for that matter should be used single-threaded-ly yet in your case it looks like that is not the case.
Are you using Python or Scala?

Kramer_Li · March 23, 2016, 3:05am

Hi Costin

I am using python. So this is a bug not because I`m doing it in a wrong way? right?

Thanks verymuch

costin · April 5, 2016, 2:34pm

Sorry, I don't know enough Python to be able to help. It's likely that the Hadoop Input/Output format are used incorrectly but looking at your code it doesn't seem that you are doing much so maybe there's something else going on in Spark/Python.

Sorry...

Topic		Replies	Views
Problem with retrieving data from Elasticsearch by Spark Elasticsearch es-hadoop	2	1416	March 11, 2019
ES-Hadoop PySpark error Elasticsearch es-hadoop	2	2170	January 10, 2018
Question about Elasticsearch and Spark Elasticsearch	3	1362	July 6, 2017
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: returned [400\|Bad Request:] Elasticsearch es-hadoop	3	3156	September 2, 2017
EsHadoopInvalidRequest: [POST] Elasticsearch es-hadoop	10	3183	July 6, 2017

Can you help to check this error please?

Related topics