Hi,
I would like to add some more information to my case .....
I am using elasticsearch-spark-20_2.11-7.0.1.jar and commons-httpclient-3.1.jar connectors to connect spark session to E.S
In my spark dataframe all the date fields format are as follow:
val cdr=spark.read.option("delimiter", ";")
.option("timestampFormat","yyyy-MM-dd+HH:mm:ss.SSS")
.csv("/tmp/FlumeData.1559132375639")
cdr.select("GLOBAL_CALL_IDENTIFIER","RECORD_SEQUENCE_IDENTIFIER","cdr_status","ing_signal_start_time").show(false)
+----------------------+--------------------------+----------+-----------------------+
|GLOBAL_CALL_IDENTIFIER|RECORD_SEQUENCE_IDENTIFIER|cdr_status|ing_signal_start_time |
+----------------------+--------------------------+----------+-----------------------+
|AAAALlzX5/OBgAABNF9s9g|6602105 |U |2019-05-12 09:31:31.735|
|AAAALlzX5/SBgAABNF9tIQ|6602106 |U |2019-05-12 09:31:32.469|
As you can see the date format of ing_signal_start_time is in format yyyy-MM-dd HH:mm:ss.SSS
Before running the spark job, I have manually create the and mapped the index on E.S :
PUT hadoop
{
"mappings":
{
"properties": {
"CALLED_NUMBER": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"CALLING_NUMBER": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
......
"ING_SIGNAL_START_TIME": {
"type":"date",
"format":"yyyy/MM/dd HH:mm:ss ZZZ"
},
.....
Exection of spark job failed and shows that : "failed to parse date field [1559118455246] with format [yyyy/MM/dd HH:mm:ss ZZZ]"
Questions:
- Does the elasticsearch-spark-20_2.11-7.0.1.jar is trying to "convert" the date value to some integer/unixtime value ?
- Does one can suggest from where the value "1559118455246" came from ? I have no such value in my csv file also 1559118455246 is not equivalent to any unixtime date.
- The same senario is happen for each and any date field as you can see bellow
- those values such as 1559118455246 are assign to the "date" fileds even if i am drooping the index and let E.S to automatically map the datatypes
The log erros is:
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse field [ING_SIGNAL_START_TIME] of type [date]
in document with id 'WUwnq2wBsHV3tPEc_2qs';org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception:
failed to parse date field [1559118455246] with format [yyyy/MM/dd HH:mm:ss ZZZ];org.elasticsearch.hadoop.rest.EsHadoopRemoteException:
date_time_parse_exception: Text '1559118455246' could not be parsed at index 0
{"index":{}}
{"YEAR":2019,"MONTH":5,"DAY":29,"HOUR":11,"ING_SIGNAL_START_TIME":1559118455246,
"CALLING_NUMBER":"339700000","CALLED_NUMBER":"334500000","CALL_DURATION":0.0,
"CDR_STATUS":"U","CALL_TYPE":"3",
"ING_CALL_RELEASE_TIME":1559118457210,
"ING_RELEASE_COMPLETE_TIME":1559118457213,
"EGR_CALL_RELEASE_TIME":1559118457208,
"EGR_RELEASE_COMPLETE_TIME":1559118457208,
"country":"France",
"latitude":46.22764,
"longitude":2.213749}