What version of ES-Hadoop are you using and what version of Spark? Also what does your mapping in ES looks like?
Also what's full strack trace that you get?
first, thanks for answering, the Spark version is 1.4.2 and ES-Hadoop is 2.2.0-m1. I think that the problem is due to not all fields are present in the elasticsearch Index, for example the field "name" is not present in 500k documents
16/02/22 10:44:12 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'positions.start_date' not found; typically this occurs with arrays which are not mapped as single value
at org.elasticsearch.spark.sql.RowValueReader$class.rowColumns(RowValueReader.scala:34)
at org.elasticsearch.spark.sql.ScalaRowValueReader.rowColumns(ScalaEsRowValueReader.scala:14)
at org.elasticsearch.spark.sql.ScalaRowValueReader.createMap(ScalaEsRowValueReader.scala:42)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:766)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
at org.elasticsearch.hadoop.serialization.ScrollReader.list(ScrollReader.java:734)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:687)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:794)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:457)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:382)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:277)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:250)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.