Elasticsearch dependency on zeppelin and emr breaks the ability to read parquet files


(Amir) #1

Hi,
I am running zeppelin and spark on amazon EMR and when I add a dependency to elasticsearch (org.elasticsearch:elasticsearch-spark-20_2.11:5.0.1) reading parquet files starts failing with the following error:
java.io.InvalidClassException: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat; local class incompatible: stream classdesc serialVersionUID = 8032587261116450585, local class serialVersionUID = -3906549587749639683

In case someone want to reproduce the problem, this snippet will make it happen on EMR 5.0.3
sqlContext.sql("select 'fooo' as some_column").write.mode("overwrite").parquet("s3a://yours3bucket/somepathinthebucket")
sqlContext.read.parquet("s3a://yours3bucket/somepathinthebucket").createOrReplaceTempView("yayaya")
sqlContext.sql("select * from yayaya").count

Did anyone see this behavior and can help solve this?

Thanks!


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.