I `m trying to read from data ElasticSearch to spark ?
conf = {"es.resource":"sflow_*/sflow","es.nodes":"ES01","es.query":'some query'}
rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat", "org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)
rdd.take(2)
After rdd.take(2) The process will stuck and issue the warn log like below
16/03/14 20:52:07 WARN httpclient.SimpleHttpConnectionManager: SimpleHttpConnectionManager being used
incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or
method is using this connection manager at a time.
But use rdd.first() will always return result successfully. Do you know why?