Java.lang.IllegalArgumentException: Invalid format: "Fri Oct 09 02:06:12 +0000 2015"


(Ramdev Wudali) #1

Hi:
I am running ES 1.5.2 with the es-hadoop and es-spark versions 2.1.1 (with Spark 1.5.1 running in Local mode. )

My application can "successfully" connect and read the content. However when I try to do this :
(The data I am trying to read is tweets that have been indexed in my instance). and I am trying to read the created_at field which has the format : "Fri Oct 09 02:06:12 +0000 2015"
So when I do this :

val firstDoc = esResults.first() 

I get the following error :

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)

I looked at the issue that was closed : https://github.com/elastic/elasticsearch-hadoop/issues/458 and this was done as part of 2.1.0 (and so expecting it to be part of 2.1.1). But it does not seem to have fixed it all.

Any suggestion ?

here is the code snippet :
val conf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster(master)
conf.set("es.nodes","esHost")
conf.set("es.port","7200")

val sc = new SparkContext(conf)

val esResults = sc.esRDD("tweets/twitter")

val firstDoc = esResults.first()


(Costin Leau) #2

Can you post the full stack trace?
The issue is that caused by the fact that the date you pass is not in ISO8601 format. The bug that you refer to doesn't apply here since in that case the data was obeying the ISO format.

P.S. Further more, in your case Joda library is detected and used instead which provides a richer and more lenient parsing yet it still fails.


(Ramdev Wudali) #3

as requested the full stack trace here is the gist :


(Costin Leau) #4

Yup, the exception confirms it:

java.lang.IllegalArgumentException: Invalid format: "Fri Oct 09 02:06:12 +0000 2015"
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:899)

(Ramdev Wudali) #5

So, How do I get around this particular issue ? Do I need to write a custom parser and pass it in to deserialize the esRDD ?


(Costin Leau) #6

I presume created_at is mapped at a date field with a custom date format right?
That's why the connector tries to read this field as a date using the optionalTimeDate format (which the field clearly isn't obeying).

See this section of the docs - it means adding the default format to the field.
Any particular reason why you are not using the default ISO 8601 format?


(Ramdev Wudali) #7

The reason is that we want to preserve the original tweet as we received it without changing date formats and such.
Thanks for the pointer about adding default formats.

Ramdev


(Ramdev Wudali) #8

Hi Costin:
I am not able to figure out how to provide the Mapping I have to SparkConf so that the esRDD is able to map the right dateTimeFormat for the field in question. are there examples I can take a look at on how custom mapping is to be provided ?

Thanks much

Ramdev


(Thomas Decaux) #9

Hi guys,

Any example about this?

Cheers,


(system) #10