Pyspark es.query not working. only default "match_all" works

buster · September 13, 2017, 7:47pm

In pypspark the only way I can get data returned from ES is by leaving es.query default. Why is this?

es_query = {"match" : {"key" : "value"}}
es_conf = {"es.nodes" : "localhost", "es.resource" : "index/type", "es.query" : json.dumps(es_query)}
rdd = sc.newAPIHadoopRDD(inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",keyClass="org.apache.hadoop.io.NullWritable",valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=es_conf)

rdd.count()
...
0
rdd.first()
ValueError: RDD is empty

Yet when,
es_query = {"match_all" : {}}

rdd.first()
(u'2017-09-01 01:02:03)

*I have tested the queries by directly querying elastic search and they work so it is something wrong with spark/es-hadoop.

james.baiera · October 4, 2017, 3:21am

@buster Are you specifying the query as a string or as a map of maps? It looks like you're omitting the quotes needed to make the query a string in your posted example.

system · November 1, 2017, 3:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to create a RDD from ElasticSearch using DSL Elasticsearch es-hadoop	2	918	July 6, 2017
ES ignores queries through Spark Elasticsearch	3	814	July 6, 2017
Es spark ignoring fields query parameter Elasticsearch	4	737	July 6, 2017
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1591	March 2, 2017
ElasticSearch spark esRDD not returing the aggregate values in aggregated query Elasticsearch	2	1349	July 6, 2017

Pyspark es.query not working. only default "match_all" works

Related topics