This is the simplest sample I'm trying to get working (code + maven dependencies)
And it's crushing on .count() call with stack
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.(AbstractEsRDDIterator.scala:28)
at org.elasticsearch.spark.rdd.ScalaEsRDDIterator.(ScalaEsRDD.scala:43)
at org.elasticsearch.spark.rdd.ScalaEsRDD.compute(ScalaEsRDD.scala:39)
at org.elasticsearch.spark.rdd.ScalaEsRDD.compute(ScalaEsRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It seems to be an incompatibility issue but I can't understand where is it.
This error typically occurs when using a library with an incompatible version of Scala. I'm assuming your Scala version is 2.11 since that's the spark compatibility level you have in your dependencies. I would change your es dependency to org.elasticsearch:elasticsearch-spark-20:5.2.0. If you deploy third party jars to your cluster, make sure all nodes have this updated artifact.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.