esRDD .count() is not working with my setup

Hi all!

This is the simplest sample I'm trying to get working (code + maven dependencies)

And it's crushing on .count() call with stack
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.(AbstractEsRDDIterator.scala:28)
at org.elasticsearch.spark.rdd.ScalaEsRDDIterator.(ScalaEsRDD.scala:43)
at org.elasticsearch.spark.rdd.ScalaEsRDD.compute(ScalaEsRDD.scala:39)
at org.elasticsearch.spark.rdd.ScalaEsRDD.compute(ScalaEsRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

It seems to be an incompatibility issue but I can't understand where is it.

Please assist me.

Thanks, Art.

This error typically occurs when using a library with an incompatible version of Scala. I'm assuming your Scala version is 2.11 since that's the spark compatibility level you have in your dependencies. I would change your es dependency to org.elasticsearch:elasticsearch-spark-20:5.2.0. If you deploy third party jars to your cluster, make sure all nodes have this updated artifact.

Hey @james.baiera,

Thanks for your help. Actually I figured out this the hard way :slight_smile:

For future references this is the correct dependency to include into maven's pom

 <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-spark-20_2.11</artifactId>
        <version>5.1.1</version>
    </dependency>

Best, Artem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.