Date format issue when passing data from spark to ElasticSearch

Yoav_Ben_Moha · August 19, 2019, 9:13am

I have successfuly uploaded data from spark into elasticsearch 7.

The date format i have in the csv that i am reading are: yyyy-MM-dd+HH:mm:ss.SSS

  val cdr=spark.read.option("delimiter", ";")
      .option("timestampFormat","yyyy-MM-dd+HH:mm:ss.SSS")
      .schema(customSchema)
      .csv("/data/staging/abc")

Index created in elasticsearch didnt recocnized those values as date.
It was automaically maped by ES as:

"ING_CALL_ANSWER_TIME": {
"type": "long"

Does the above format is not a valid date in ES ?

Thanks

spinscale · August 19, 2019, 9:36am

this is not a standard date format it seems. The ISO8601 usually has a T between date and time and not a plus sign. You need to explicitely configure this format in the mapping for your index.

Yoav_Ben_Moha · August 19, 2019, 7:28pm

Hi,
I would like to add some more information to my case .....

I am using elasticsearch-spark-20_2.11-7.0.1.jar and commons-httpclient-3.1.jar connectors to connect spark session to E.S
In my spark dataframe all the date fields format are as follow:

 val cdr=spark.read.option("delimiter", ";")
      .option("timestampFormat","yyyy-MM-dd+HH:mm:ss.SSS")
      .csv("/tmp/FlumeData.1559132375639")

cdr.select("GLOBAL_CALL_IDENTIFIER","RECORD_SEQUENCE_IDENTIFIER","cdr_status","ing_signal_start_time").show(false)

 +----------------------+--------------------------+----------+-----------------------+
|GLOBAL_CALL_IDENTIFIER|RECORD_SEQUENCE_IDENTIFIER|cdr_status|ing_signal_start_time  |
+----------------------+--------------------------+----------+-----------------------+
|AAAALlzX5/OBgAABNF9s9g|6602105                   |U         |2019-05-12 09:31:31.735|
|AAAALlzX5/SBgAABNF9tIQ|6602106                   |U         |2019-05-12 09:31:32.469|

As you can see the date format of ing_signal_start_time is in format yyyy-MM-dd HH:mm:ss.SSS

Before running the spark job, I have manually create the and mapped the index on E.S :

PUT hadoop
{
  "mappings": 
  {
      "properties": {
      "CALLED_NUMBER": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "CALLING_NUMBER": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      ......
      "ING_SIGNAL_START_TIME": {
           "type":"date",
            "format":"yyyy/MM/dd HH:mm:ss ZZZ"
      },
      .....

Exection of spark job failed and shows that : "failed to parse date field [1559118455246] with format [yyyy/MM/dd HH:mm:ss ZZZ]"

Questions:

Does the elasticsearch-spark-20_2.11-7.0.1.jar is trying to "convert" the date value to some integer/unixtime value ?
Does one can suggest from where the value "1559118455246" came from ? I have no such value in my csv file also 1559118455246 is not equivalent to any unixtime date.
The same senario is happen for each and any date field as you can see bellow
those values such as 1559118455246 are assign to the "date" fileds even if i am drooping the index and let E.S to automatically map the datatypes

The log erros is:

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse field [ING_SIGNAL_START_TIME] of type [date] 
in document with id 'WUwnq2wBsHV3tPEc_2qs';org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: 
failed to parse date field [1559118455246] with format [yyyy/MM/dd HH:mm:ss ZZZ];org.elasticsearch.hadoop.rest.EsHadoopRemoteException: 
date_time_parse_exception: Text '1559118455246' could not be parsed at index 0
{"index":{}}
{"YEAR":2019,"MONTH":5,"DAY":29,"HOUR":11,"ING_SIGNAL_START_TIME":1559118455246,
"CALLING_NUMBER":"339700000","CALLED_NUMBER":"334500000","CALL_DURATION":0.0,
"CDR_STATUS":"U","CALL_TYPE":"3",
"ING_CALL_RELEASE_TIME":1559118457210,
"ING_RELEASE_COMPLETE_TIME":1559118457213,
"EGR_CALL_RELEASE_TIME":1559118457208,
"EGR_RELEASE_COMPLETE_TIME":1559118457208,
"country":"France",
"latitude":46.22764,
"longitude":2.213749}

Yoav_Ben_Moha · August 20, 2019, 6:04pm

Hi,
Problem solve .
The number : 1559118455246 represent timestamp including milisecond
All you i had to do in order to see the value in date format in kibana is to recreate the index manually by setting the date fields with no format .
e.g:

PUT <index_name>
{
  "mappings": 
  {
      "properties": {
      ......
      "the_date_field": {
           "type":"date"
      },

HTH

Thanks

system · September 17, 2019, 6:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trouble with Timestamp format '"dd-MM-yyyy HH:mm:ss" Elasticsearch	5	3324	July 6, 2017
Wrong date format when reading from elasticsearch index Elasticsearch	7	1071	July 23, 2020
Strange date behaviour when indexing content Elasticsearch	2	447	July 5, 2017
What's the actual data type for a field defined as "date" with "dateOptionalTime" Elasticsearch es-hadoop	3	1094	March 14, 2017
Date format issue when reading data from ElasticSearch Elasticsearch	1	359	July 9, 2020

Date format issue when passing data from spark to ElasticSearch

Related topics