Timestamp field being passed in epoch with Hadoop Library

Wayne_Taylor · July 5, 2018, 5:06pm

HI Team,

I have been able to follow instructions to get my ORC data passed to Elasticsearch but having issues with my data source not having a timestamp and even with formatting the timestamp shows as numeric.

Below are my steps:

Load pyspark and pass in the Elasticsearch Hadoop JAR:
Downloads/spark/bin/pyspark --jars ~/Downloads/elasticsearch-hadoop-6.3.0.jar
Create a data frame from a local ORC file: df = spark.read.format("orc").load("/Users/wtaylor/Downloads/TEST/*")
Create a Temp Table so I can query my ORC and aggregate:
usage = df.registerTempTable("esexample")
Cache results from temp from my SQL: aggUrldf = spark.sql(aggSql).cache()

Note in the SQL my date source field is in Epoch with MS but I change to timestamp:
timestamp(from_unixtime(start_time/1000)) as start_time

I then pass to ES using following:
aggUrldf.write.format("org.elasticsearch.spark.sql").option("es.nodes.wan.only","true").option("es.nodes", esUrl).mode("Overwrite").option("es.net.http.auth.user",esUser).option("es.net.http.auth.pass",esPassword).save("indexname/doctype")

Verified my data is in ES. But format is numeric in Epoch. See example:

{
"_index": "indexname",
"_type": "test",
"_id": "qFdja2QBCIhbyqjdz7hd",
"_score": 1,
"_source": {
"id": "21590385",
"origination_airport": "KCLT",
"destination_airport": "KSEA",
"start_time": 1530444648000,
"client_ip": "10.34.11.162",
"url": "gateway.icloud.com",
"rx_total_bytes": 828,
"tx_total_bytes": 2578
}

I was unable to get a combination from Configuration | Elasticsearch for Apache Hadoop [8.11] | Elastic working.

Any ideas?

Thanks
Wayne

Wayne_Taylor · July 17, 2018, 12:33pm

After working with ES team in git this is a bug. https://github.com/elastic/elasticsearch-hadoop/issues/1173

system · August 14, 2018, 12:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PySpark - How to read timestamp date_nanos from ElasticSearch to Spark? Elasticsearch es-hadoop	2	708	December 8, 2021
ElasticSearch Spark Elasticsearch es-hadoop	3	969	July 6, 2017
Trouble with Timestamp format '"dd-MM-yyyy HH:mm:ss" Elasticsearch	5	3320	July 6, 2017
Date format issue when passing data from spark to ElasticSearch Elasticsearch es-hadoop	4	3975	September 17, 2019
Elasticsearch 2.0 and Spark - TimestampType conversion issue Elasticsearch es-hadoop	5	1779	July 6, 2017

Timestamp field being passed in epoch with Hadoop Library

Related topics