Hi all,
I need to read the timestamp from ElasticSearch to Spark RDD or DataFrame. I can read other fields (text, interger..) but timestamp i can not.
Spark v2.4.7
Python: v3.6.11
ElasticSearch: v7.10.0
Here is my code (python):
q = """{
"query": {
"match_all": {}
}
}"""
es_read_conf = {
"es.nodes": "localhost",
"es.port": "9200",
"es.resource": "testinterface3",
"es.query": q,
"es.mapping.date.rich": "false"
}
es_rdd = self.sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf)
for x in es_rdd.collect():
print(x)
ES mapping:
"testinterface3" : {
"aliases" : { },
"mappings" : {
"properties" : {
"hello" : {
"type" : "text"
},
"interface" : {
"type" : "keyword"
},
"pkts" : {
"type" : "integer"
},
"timestamp" : {
"type" : "date_nanos"
}
}
}
Output printed:
('testinterface3-0-668', {'hello': 'test', 'interface': 'testinterface3', 'pkts': 50})
('testinterface3-0-669', {'hello': 'test', 'interface': 'testinterface3', 'pkts': 52})
('testinterface3-0-670', {'hello': 'test', 'interface': 'testinterface3', 'pkts': 51})
Expected output: Has timestamp column.
Thank you in advance.