Hi,
I have problem with retrieving _parent field from elasticsearch using pyspark. Rdd does not contains _parent field if I specify that field in fields. My Code:
es_query = {
"fields": ["_parent", "_source"]
}
es_read_conf = {
"es.nodes" : "localhost",
"es.resource" : "crm/event",
"es.nodes.wan.only": "true",
"es.query": json.dumps(es_query),
"es.read.metadata": "true"
}
es_rdd = SparkContext().newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf)
print(es_rdd.first())
Rdd looks like:
(u'wn4U76ggQKKEKjBNEqiVyg', {u'isp': None, u'tags': None, u'url': None, u'ip': None, u'website_id': 4, u'_metadata': {u'_score': 0.0, u'_type': u'event', u'_id': u'wn4U76ggQKKEKjBNEqiVyg', u'_index': u'crm'}, u'create_timestamp': 1462300665000, u'fields': {}, u'type': u'met_scenario_condition', u'additional_data': {u'block_id': 261, u'scenario_id': 13, u'block_name': u'Warunek wej\u015bciowy', u'action_type': u'mail', u'scenario_name': u'SP Ostatnio przegl\u0105dane', u'action_id': 261}})
When I use this query in elasticsearch-hammer _parent field is present in response.
I've tried every version, including alpha version. What's wrong?