Elasticsearch-spark not compatible with Spark 2.0.0

Bryce · August 12, 2016, 9:50pm

I our team uses elasticsearch-spark and are currently in the process of upgrading our spark to version 2.0.0. We get the following errors:

[error] bad symbolic reference. A signature in Column.class refers to type Logging
[error] in package org.apache.spark which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling Column.class.
[error] bad symbolic reference. A signature in SQLContext.class refers to type Logging
[error] in package org.apache.spark which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling SQLContext.class.
[error] bad symbolic reference. A signature in DataFrameReader.class refers to type Logging
[error] in package org.apache.spark which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling DataFrameReader.class

We believe this is due to this change in Spark 2.0.0:

The following features have been removed in Spark 2.0:

Bagel
Support for Hadoop 2.1 and earlier
The ability to configure closure serializer
HTTPBroadcast
TTL-based metadata cleaning
Semi-private class org.apache.spark.Logging. We suggest you use slf4j directly.

Is this a known issue? If so, are there any plans to fix it?

Thanks,

Bryce

johtani · August 13, 2016, 1:01pm

Now, we support Spark 2.0 with es-hadoop 5.0 alpha5.

See 5.0 alpha5 release blog. https://www.elastic.co/blog/es-hadoop-5-0-0-alpha5
And https://github.com/elastic/elasticsearch-hadoop/issues/647

Bryce · August 15, 2016, 4:23pm

Thanks, I will try that.

jspooner · September 27, 2016, 5:09pm

Why is Spark 2.0 support rolled into the ES 5.0 release? We are upgrading our Spark from 1.6 to 2.0 right now. What do you suggest todo?

james.baiera · September 28, 2016, 2:53am

With Spark 2.0 not being backwards compatible with 1.6, and the fact that it is the new default spark version for ES-Hadoop, we decided to align the development with the 5.0 release. We felt that this breaking of binary compatibility should only be realized with a major version increase.

While I highly advise not using the beta release for a production deployment, I do suggest that you perform your testing with the beta to ensure a successful rollout when 5.0 eventually lands.

jspooner · September 28, 2016, 4:01am

Hi James,

How is Spark not backwards compatible? The Spark 1.6 to 2.0 upgrade is fairly simple as you can see from the upgrading docs.

Right now I'm finishing up migrating our ETL jobs to Spark 2.0 and next I'll be updating the elasticsearch publishers that use es-hadoop. We have a 4TB elasticsearch cluster that we will not be upgrading to 5.0 this year.

What are my options for using Spark 2.0 with elasticsearch 2.4.0?

james.baiera · September 28, 2016, 4:21am

The biggest item that causes the binary incompatibility between 1.3-1.6 and 2.0 is the removal of the DataFrame and the addition of Dataset. DataFrame continues to exist for users, but is just a type alias for Dataset[Row]. This allows code written by users to continue to work with just a simple recompilation.

However, since ES-Hadoop is built by extending native Spark interfaces and must be distributed to other users, the compiled classes cannot reconcile the differences between 1.3-1.6 and 2.0 at runtime. Thus we have two separate distributions for the versions in 5.0.

It is also our policy to keep the default versions used in the main ES-Hadoop jar in lock step with the latest version of the technology that we support. Because of this, support for Spark 1.3-1.6 has been added as a separate artifact for backwards compatibility in 5.0.

When ES-Hadoop v5.0 is released, it will support both Spark 2.0 and Elasticsearch 2.4.0. I am sorry to say that earlier versions of ES-Hadoop will not have support for Spark 2.0 due to the disruptiveness of the version change on the rest of the project. We have decided to reserve this breaking change for a major version release in keeping with the semantic versioning principles.

1222kkk · December 19, 2016, 1:03pm

I'm sorry to disturb you.

As you mentioned above, I just wonder whether the ES2.3 can work with spark2.0.2 now.

Recently I tried to work on this but failed.

I have alreadly tried es-hadoop-2.3/2.4/5.0/5.1, all of them doesn't work well with es2.3 and spark2.0.2.

Is there any solution I can solve the problem?
@james.baiera

Topic		Replies	Views
Class org.elasticsearch.spark.sql.SparkSQLCompatibilityLevel not found Elasticsearch es-hadoop	2	1180	April 9, 2017
Any plan to support Spark2.0 in Elasticsearch-Hadoop? Elasticsearch es-hadoop	3	870	January 12, 2017
Write ES error with Spark 2.0 release Elasticsearch es-hadoop	9	4007	July 6, 2017
Getting Error on sc.esRDD ES-Hadoop 5.2.2 Spark version 2.1.0 Elasticsearch es-hadoop	2	928	April 25, 2017
Spark + ES-Hadoop 2.1.2 + ElasticSearch 2.0 Unsupported Elasticsearch es-hadoop	4	2493	July 6, 2017

Elasticsearch-spark not compatible with Spark 2.0.0

Related topics