Date format issue when passing data from spark to ElasticSearch

I have successfuly uploaded data from spark into elasticsearch 7.

The date format i have in the csv that i am reading are: yyyy-MM-dd+HH:mm:ss.SSS

  val cdr=spark.read.option("delimiter", ";")
      .option("timestampFormat","yyyy-MM-dd+HH:mm:ss.SSS")
      .schema(customSchema)
      .csv("/data/staging/abc")

Index created in elasticsearch didnt recocnized those values as date.
It was automaically maped by ES as:

"ING_CALL_ANSWER_TIME": {
"type": "long"

Does the above format is not a valid date in ES ?

Thanks

this is not a standard date format it seems. The ISO8601 usually has a T between date and time and not a plus sign. You need to explicitely configure this format in the mapping for your index.

Hi,
I would like to add some more information to my case .....

I am using elasticsearch-spark-20_2.11-7.0.1.jar and commons-httpclient-3.1.jar connectors to connect spark session to E.S
In my spark dataframe all the date fields format are as follow:

 val cdr=spark.read.option("delimiter", ";")
      .option("timestampFormat","yyyy-MM-dd+HH:mm:ss.SSS")
      .csv("/tmp/FlumeData.1559132375639")

cdr.select("GLOBAL_CALL_IDENTIFIER","RECORD_SEQUENCE_IDENTIFIER","cdr_status","ing_signal_start_time").show(false)

 +----------------------+--------------------------+----------+-----------------------+
|GLOBAL_CALL_IDENTIFIER|RECORD_SEQUENCE_IDENTIFIER|cdr_status|ing_signal_start_time  |
+----------------------+--------------------------+----------+-----------------------+
|AAAALlzX5/OBgAABNF9s9g|6602105                   |U         |2019-05-12 09:31:31.735|
|AAAALlzX5/SBgAABNF9tIQ|6602106                   |U         |2019-05-12 09:31:32.469|

As you can see the date format of ing_signal_start_time is in format yyyy-MM-dd HH:mm:ss.SSS

Before running the spark job, I have manually create the and mapped the index on E.S :

PUT hadoop
{
  "mappings": 
  {
      "properties": {
      "CALLED_NUMBER": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "CALLING_NUMBER": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      ......
      "ING_SIGNAL_START_TIME": {
           "type":"date",
            "format":"yyyy/MM/dd HH:mm:ss ZZZ"
      },
      .....      

Exection of spark job failed and shows that : "failed to parse date field [1559118455246] with format [yyyy/MM/dd HH:mm:ss ZZZ]"

Questions:

  1. Does the elasticsearch-spark-20_2.11-7.0.1.jar is trying to "convert" the date value to some integer/unixtime value ?
  2. Does one can suggest from where the value "1559118455246" came from ? I have no such value in my csv file also 1559118455246 is not equivalent to any unixtime date.
  3. The same senario is happen for each and any date field as you can see bellow
  4. those values such as 1559118455246 are assign to the "date" fileds even if i am drooping the index and let E.S to automatically map the datatypes

The log erros is:

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse field [ING_SIGNAL_START_TIME] of type [date] 
in document with id 'WUwnq2wBsHV3tPEc_2qs';org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: 
failed to parse date field [1559118455246] with format [yyyy/MM/dd HH:mm:ss ZZZ];org.elasticsearch.hadoop.rest.EsHadoopRemoteException: 
date_time_parse_exception: Text '1559118455246' could not be parsed at index 0
{"index":{}}
{"YEAR":2019,"MONTH":5,"DAY":29,"HOUR":11,"ING_SIGNAL_START_TIME":1559118455246,
"CALLING_NUMBER":"339700000","CALLED_NUMBER":"334500000","CALL_DURATION":0.0,
"CDR_STATUS":"U","CALL_TYPE":"3",
"ING_CALL_RELEASE_TIME":1559118457210,
"ING_RELEASE_COMPLETE_TIME":1559118457213,
"EGR_CALL_RELEASE_TIME":1559118457208,
"EGR_RELEASE_COMPLETE_TIME":1559118457208,
"country":"France",
"latitude":46.22764,
"longitude":2.213749}

Hi,
Problem solve .
The number : 1559118455246 represent timestamp including milisecond
All you i had to do in order to see the value in date format in kibana is to recreate the index manually by setting the date fields with no format .
e.g:

PUT <index_name>
{
  "mappings": 
  {
      "properties": {
      ......
      "the_date_field": {
           "type":"date"
      },

HTH

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.