NoClassDefFoundError: Lorg/apache/spark/sql/catalyst/types/StructType with spark_2.10-2.2.0-rc1


(Andrew Wooster) #1

I am running a simple Java Spark SQL driver using
elasticsearch-spark_2.10-2.2.0-rc1 against an Elasticsearch 2.1 server
and spark-1.6.0-bin-hadoop2.6. I get the following exception when I try to access my spark SQL query results:

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.types.StructType
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
    at java.lang.Class.getDeclaredField(Class.java:2068)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1703)
    at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:484)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
...
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

My configuration is as follows:

        SparkConf conf = new SparkConf()
            .setAppName("Magellan")
            .setMaster("spark://ec2-xxx-xxx-xx-xxx.compute-1.amazonaws.com:11407")
            .set("es.port", "11100")
            .set("es.scroll.size", "1000")
            .setSparkHome("/home/magellan/servers/dev/opt/spark-1.6.0-bin-hadoop2.6")
            .setJars(new String[] {
                "../lib/elasticsearch-spark-1.2_2.10-2.2.0-rc1.jar",
                "target/magellan-spark-1.0-SNAPSHOT.jar"})
                ;
        JavaSparkContext sc = new JavaSparkContext(conf);

        SQLContext sql = new SQLContext(sc);        
        DataFrame dataFrame = sql.read().format("org.elasticsearch.spark.sql").load("my_index/doc"); 
        dataFrame.registerTempTable("tab");
        sql.cacheTable("tab");
        DataFrame data = sql.sql("SELECT title FROM tab");
        Row[] rows = data.head(1);

(Costin Leau) #2

You are using the wrong version of ES-Spark. For some reason you are using the one for Spark 1.0-1.2 while you are using Spark 1.6 and thus should be using the default version. If you are wondering why there are two versions - well, because as you can see in your stacktrace, Spark 1.3 broke backwards compatibility in the SQL package by introducing/changing packages.

This is also explained in the docs here.


(Andrew Wooster) #3

Thanks Costin. I guess I was confused by all the version numbers in the jar name.

I replaced:

.setJars(new String[] {
                "../lib/elasticsearch-spark-1.2_2.10-2.2.0-rc1.jar",
                ...

with

.setJars(new String[] {
               "../lib/elasticsearch-spark_2.10-2.2.0-rc1.jar",

and it works!


(Costin Leau) #4

Glad to hear it. I hear you, the scala prefix plus version, plus spark itself is quite confusing (one can't really use S1 vs S2 nor spark1 vs spark 2).


(system) #5