Spark: java.lang.ClassCastException org.elasticsearch.spark.sql.EsDataFrameWriter to field org.elasticsearch.spark.sql.EsSparkSQL

Running es4Hadoop 2.4.4 with Spark 1.6.0, Java 1.8, when:

JavaEsSparkSQL.saveToEs(DATA_FRAME, "consolidation-v2/items");

(where DATA_FRAME is coming from a Parquet file).

I got the exception:

Caused by: java.lang.ClassCastException: cannot assign instance of org.elasticsearch.spark.sql.EsDataFrameWriter to field org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.eta$0$1$1 of type org.elasticsearch.spark.sql.EsSchemaRDDWriter in instance of org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745) 

But doing the same thing via spark-shell is working fine :confused:

That's fairly peculiar. Can you double check which versions of ES-Spark are getting picked up? I'm wondering if there's a mismatch of versions on your driver versus what is running on the cluster... EsSchemaRDDWriter is only in the spark 1.2 compatibility distribution. EsDataFrameWriter is used in spark 1.3 and up.

Ya it sounds something like a mix between compile & run time.

Now I include the es4Hadoop at build time (see https://medium.com/@thomasdecaux/setup-elasticsearch-hadoop-plugin-for-apache-spark-2b6f7a75f77d) and no more trouble.

Thanks you,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.