esRDD .count() is not working with my setup

ArtemKoval · February 6, 2017, 9:30pm

Hi all!

This is the simplest sample I'm trying to get working (code + maven dependencies)

gist.github.com

https://gist.github.com/ArtemKoval/80169887a487bb5ab89f736eec81e92b

es-spark

import org.apache.spark.{SparkConf, SparkContext}
import org.elasticsearch.spark._

object EsSpark {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("es-spark").setMaster("local[*]")
    conf.set("es.index.auto.create", "true")
    conf.set("es.nodes", "amazonaws.com")
    conf.set("es.port", "443")
    conf.set("es.nodes.wan.only", "true")

This file has been truncated. show original

And it's crushing on .count() call with stack
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.(AbstractEsRDDIterator.scala:28)
at org.elasticsearch.spark.rdd.ScalaEsRDDIterator.(ScalaEsRDD.scala:43)
at org.elasticsearch.spark.rdd.ScalaEsRDD.compute(ScalaEsRDD.scala:39)
at org.elasticsearch.spark.rdd.ScalaEsRDD.compute(ScalaEsRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

It seems to be an incompatibility issue but I can't understand where is it.

Please assist me.

Thanks, Art.

james.baiera · February 8, 2017, 3:58pm

This error typically occurs when using a library with an incompatible version of Scala. I'm assuming your Scala version is 2.11 since that's the spark compatibility level you have in your dependencies. I would change your es dependency to org.elasticsearch:elasticsearch-spark-20:5.2.0. If you deploy third party jars to your cluster, make sure all nodes have this updated artifact.

ArtemKoval · February 13, 2017, 8:12pm

Hey @james.baiera,

Thanks for your help. Actually I figured out this the hard way

For future references this is the correct dependency to include into maven's pom

 <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-spark-20_2.11</artifactId>
        <version>5.1.1</version>
    </dependency>

Best, Artem.

system · March 13, 2017, 8:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about Elasticsearch and Spark Elasticsearch	3	1362	July 6, 2017
Getting Error on sc.esRDD ES-Hadoop 5.2.2 Spark version 2.1.0 Elasticsearch es-hadoop	2	928	April 25, 2017
Getting Error on sc.esRDD on Scala11 version of Spark Elasticsearch es-hadoop	7	5820	July 6, 2017
Problem between Spark and Elasticsearch Elasticsearch es-hadoop	2	2346	July 6, 2017
java.lang.ClassNotFoundException: org.apache.spark.Partition$class Elasticsearch es-hadoop	1	1086	April 5, 2020

esRDD .count() is not working with my setup

Related topics