Hi:
I am running ES 1.5.2 with the es-hadoop and es-spark versions 2.1.1 (with Spark 1.5.1 running in Local mode. )
My application can "successfully" connect and read the content. However when I try to do this :
(The data I am trying to read is tweets that have been indexed in my instance). and I am trying to read the created_at field which has the format : "Fri Oct 09 02:06:12 +0000 2015"
So when I do this :
val firstDoc = esResults.first()
I get the following error :
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)
here is the code snippet :
val conf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster(master)
conf.set("es.nodes","esHost")
conf.set("es.port","7200")
Can you post the full stack trace?
The issue is that caused by the fact that the date you pass is not in ISO8601 format. The bug that you refer to doesn't apply here since in that case the data was obeying the ISO format.
P.S. Further more, in your case Joda library is detected and used instead which provides a richer and more lenient parsing yet it still fails.
I presume created_at is mapped at a date field with a custom date format right?
That's why the connector tries to read this field as a date using the optionalTimeDate format (which the field clearly isn't obeying).
See this section of the docs - it means adding the default format to the field.
Any particular reason why you are not using the default ISO 8601 format?
The reason is that we want to preserve the original tweet as we received it without changing date formats and such.
Thanks for the pointer about adding default formats.
Hi Costin:
I am not able to figure out how to provide the Mapping I have to SparkConf so that the esRDD is able to map the right dateTimeFormat for the field in question. are there examples I can take a look at on how custom mapping is to be provided ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.