Elasticsearch dependency on zeppelin and emr breaks the ability to read parquet files

doskey · November 22, 2016, 7:27am

Hi,
I am running zeppelin and spark on amazon EMR and when I add a dependency to elasticsearch (org.elasticsearch:elasticsearch-spark-20_2.11:5.0.1) reading parquet files starts failing with the following error:
java.io.InvalidClassException: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat; local class incompatible: stream classdesc serialVersionUID = 8032587261116450585, local class serialVersionUID = -3906549587749639683

In case someone want to reproduce the problem, this snippet will make it happen on EMR 5.0.3
sqlContext.sql("select 'fooo' as some_column").write.mode("overwrite").parquet("s3a://yours3bucket/somepathinthebucket")
sqlContext.read.parquet("s3a://yours3bucket/somepathinthebucket").createOrReplaceTempView("yayaya")
sqlContext.sql("select * from yayaya").count

Did anyone see this behavior and can help solve this?

Thanks!

system · December 20, 2016, 7:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark 2.0.2 read data from ES 1.7.0 using spark-20_2.11:5.0.1 got error java.io.InvalidClassException: org.apache.spark.sql.execution.FilterExec Elasticsearch es-hadoop	1	882	December 28, 2016
Class org.elasticsearch.spark.sql.SparkSQLCompatibilityLevel not found Elasticsearch es-hadoop	2	1179	April 9, 2017
Zeppelin and Elasticsearch-Spark incompatibility Elasticsearch es-hadoop	1	1972	July 8, 2017
ElasticSearch spark yarn -hadoop classpath Elasticsearch es-hadoop	1	796	December 9, 2016
Problem reading from Elasticsearch using Sparl SQL Elasticsearch	1	387	July 6, 2017

Elasticsearch dependency on zeppelin and emr breaks the ability to read parquet files

Related topics