But when I'm trying to read it with Spark via this snippet:
val rdd = sc.esRDD("analytic/docs", "?q=*")
rdd.take(10)
I've got the following exception:
15/05/11 15:32:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: 20150506T110434.925+0200
at
org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown
Source)
at
org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown
Source)
at
org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.(Unknown
Source)
at
org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown
Source)
at
javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
at
javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
at
javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
at
org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:113)
at
org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at
org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at
org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:58)
at
org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:106)
at
org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:46)
at
org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:540)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:528)
at
org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at
org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
at
org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:185)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:164)
at
org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
at
org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
at
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
From my understanding, the date format for basic_date_time (pattern =
yyyyMMdd’T'HHmmss.SSSZ ) should match this date format
"20150506T110434.925+0200". Is it possible to pass the pattern properly to
the spark context ?
15/05/11 15:32:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: 20150506T110434.925+0200
at org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown Source)
at org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown Source)
at org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.(Unknown Source)
at org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown Source)
at javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
at javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
at javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:113)
at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:58)
at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:106)
at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:46)
at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:540)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:528)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:185)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:164)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
From my understanding, the date format for basic_date_time (pattern = yyyyMMdd’T'HHmmss.SSSZ ) should match this date
format "20150506T110434.925+0200". Is it possible to pass the pattern properly to the spark context ?
It was with Spark 1.2.0-cdh5.3.2 (in Yarn Cluster mode) and Spark 1.3.0
standalone (for developer environment). I have tried with Java 1.7.0_55 and
Java 1.8u45. I will try again with Spark 1.3.1 standalone and will raise an
issue if it still persists.
Thank you for your answer.
Nicolas PHUNG
On Monday, 11 May 2015 15:45:28 UTC+2, Nicolas Phung wrote:
On Monday, 11 May 2015 15:45:28 UTC+2, Nicolas Phung wrote:
Hello,
I'm trying to build a RDD from Elasticsearch data with elasticsearch-spark
2.1.0.Beta4.
But when I'm trying to read it with Spark via this snippet:
val rdd = sc.esRDD("analytic/docs", "?q=*")
rdd.take(10)
I've got the following exception:
15/05/11 15:32:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
0)
java.lang.IllegalArgumentException: 20150506T110434.925+0200
at
org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown
Source)
at
org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown
Source)
at
org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.(Unknown
Source)
at
org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown
Source)
at
javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
at
javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
at
javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
at
org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:113)
at
org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at
org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at
org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:58)
at
org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:106)
at
org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:46)
at
org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:540)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:528)
at
org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at
org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
at
org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:185)
at
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:164)
at
org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
at
org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
at
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
From my understanding, the date format for basic_date_time (pattern =
yyyyMMdd’T'HHmmss.SSSZ ) should match this date format
"20150506T110434.925+0200". Is it possible to pass the pattern properly to
the spark context ?
It was with Spark 1.2.0-cdh5.3.2 (in Yarn Cluster mode) and Spark 1.3.0 standalone (for developer environment). I have
tried with Java 1.7.0_55 and Java 1.8u45. I will try again with Spark 1.3.1 standalone and will raise an issue if it
still persists.
Thank you for your answer.
Nicolas PHUNG
On Monday, 11 May 2015 15:45:28 UTC+2, Nicolas Phung wrote:
On Monday, 11 May 2015 15:45:28 UTC+2, Nicolas Phung wrote:
Hello,
I'm trying to build a RDD from ElasticSearch data with elasticsearch-spark 2.1.0.Beta4.
I have the following field mapping :
"date": {"type": "date","format": "basic_date_time"},
But when I'm trying to read it with Spark via this snippet:
valrdd = sc.esRDD("analytic/docs","?q=*")
rdd.take(10)
I've got the following exception:
15/05/11 15:32:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: 20150506T110434.925+0200
at org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown Source)
at org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown Source)
at org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown Source)
at javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
at javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
at javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:113)
at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:58)
at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:106)
at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:46)
at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:540)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:528)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:185)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:164)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to <http://class.to>(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to <http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at org.apache.spark.rdd.RDD$$anonfun$33.apply(RDD.scala:1177)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
From my understanding, the date format for basic_date_time (pattern = yyyyMMdd’T'HHmmss.SSSZ ) should match this
date format "20150506T110434.925+0200". Is it possible to pass the pattern properly to the spark context ?
Regards,
Nicolas PHUNG
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.