Java.lang.IllegalArgumentException: Invalid format: "Fri Oct 09 02:06:12 +0000 2015"

Ramdev_Wudali · October 21, 2015, 3:01pm

Hi:
I am running ES 1.5.2 with the es-hadoop and es-spark versions 2.1.1 (with Spark 1.5.1 running in Local mode. )

My application can "successfully" connect and read the content. However when I try to do this :
(The data I am trying to read is tweets that have been indexed in my instance). and I am trying to read the created_at field which has the format : "Fri Oct 09 02:06:12 +0000 2015"
So when I do this :

val firstDoc = esResults.first()

I get the following error :

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)

I looked at the issue that was closed : https://github.com/elastic/elasticsearch-hadoop/issues/458 and this was done as part of 2.1.0 (and so expecting it to be part of 2.1.1). But it does not seem to have fixed it all.

Any suggestion ?

here is the code snippet :
val conf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster(master)
conf.set("es.nodes","esHost")
conf.set("es.port","7200")

val sc = new SparkContext(conf)

val esResults = sc.esRDD("tweets/twitter")

val firstDoc = esResults.first()

costin · October 21, 2015, 3:31pm

Can you post the full stack trace?
The issue is that caused by the fact that the date you pass is not in ISO8601 format. The bug that you refer to doesn't apply here since in that case the data was obeying the ISO format.

P.S. Further more, in your case Joda library is detected and used instead which provides a richer and more lenient parsing yet it still fails.

Ramdev_Wudali · October 21, 2015, 3:52pm

as requested the full stack trace here is the gist :

gist.github.com

https://gist.github.com/agastya71/73613e2182364b97e7c2

fullStackTraceForException

Connected to the target VM, address: '127.0.0.1:64009', transport: 'socket'
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [com.trgr.platform.riptide] additivity to [true].
log4j: Level value for com.trgr.platform.riptide is  [WARN].
log4j: com.trgr.platform.riptide level set to WARN
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [log4j.logger.org.eclipse.jetty] additivity to [true].
log4j: Level value for log4j.logger.org.eclipse.jetty is  [ERROR].

This file has been truncated. show original

costin · October 21, 2015, 4:49pm

Yup, the exception confirms it:

java.lang.IllegalArgumentException: Invalid format: "Fri Oct 09 02:06:12 +0000 2015"
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:899)

Ramdev_Wudali · October 21, 2015, 4:57pm

So, How do I get around this particular issue ? Do I need to write a custom parser and pass it in to deserialize the esRDD ?

costin · October 21, 2015, 5:01pm

I presume created_at is mapped at a date field with a custom date format right?
That's why the connector tries to read this field as a date using the optionalTimeDate format (which the field clearly isn't obeying).

See this section of the docs - it means adding the default format to the field.
Any particular reason why you are not using the default ISO 8601 format?

Ramdev_Wudali · October 21, 2015, 5:51pm

The reason is that we want to preserve the original tweet as we received it without changing date formats and such.
Thanks for the pointer about adding default formats.

Ramdev

Ramdev_Wudali · October 21, 2015, 6:51pm

Hi Costin:
I am not able to figure out how to provide the Mapping I have to SparkConf so that the esRDD is able to map the right dateTimeFormat for the field in question. are there examples I can take a look at on how custom mapping is to be provided ?

Thanks much

Ramdev

ebuildy · December 16, 2015, 7:35pm

Hi guys,

Any example about this?

Cheers,

Topic		Replies	Views
Cannot Read Elasticsearch date type with format basic_date_time in Spark Elasticsearch	4	1583	July 6, 2017
Date format issue when passing data from spark to ElasticSearch Elasticsearch es-hadoop	4	4063	September 17, 2019
ES unable to parse ISODateTimeFormat? Elasticsearch	3	464	July 6, 2017
Joda problem! Elasticsearch	8	1087	July 6, 2017
Date parsing error in ES - Invalid format Elasticsearch	2	1492	July 21, 2018

Java.lang.IllegalArgumentException: Invalid format: "Fri Oct 09 02:06:12 +0000 2015"

Related topics