I am running a simple Java Spark SQL driver using
elasticsearch-spark_2.10-2.2.0-rc1 against an Elasticsearch 2.1 server
and spark-1.6.0-bin-hadoop2.6. I get the following exception when I try to access my spark SQL query results:
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.types.StructType
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.getDeclaredField(Class.java:2068)
at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1703)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:484)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
...
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
My configuration is as follows:
SparkConf conf = new SparkConf()
.setAppName("Magellan")
.setMaster("spark://ec2-xxx-xxx-xx-xxx.compute-1.amazonaws.com:11407")
.set("es.port", "11100")
.set("es.scroll.size", "1000")
.setSparkHome("/home/magellan/servers/dev/opt/spark-1.6.0-bin-hadoop2.6")
.setJars(new String[] {
"../lib/elasticsearch-spark-1.2_2.10-2.2.0-rc1.jar",
"target/magellan-spark-1.0-SNAPSHOT.jar"})
;
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sql = new SQLContext(sc);
DataFrame dataFrame = sql.read().format("org.elasticsearch.spark.sql").load("my_index/doc");
dataFrame.registerTempTable("tab");
sql.cacheTable("tab");
DataFrame data = sql.sql("SELECT title FROM tab");
Row[] rows = data.head(1);