Elasticsearch-spark not compatible with Spark 2.0.0


(bryce ageno) #1

I our team uses elasticsearch-spark and are currently in the process of upgrading our spark to version 2.0.0. We get the following errors:

[error] bad symbolic reference. A signature in Column.class refers to type Logging
[error] in package org.apache.spark which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling Column.class.
[error] bad symbolic reference. A signature in SQLContext.class refers to type Logging
[error] in package org.apache.spark which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling SQLContext.class.
[error] bad symbolic reference. A signature in DataFrameReader.class refers to type Logging
[error] in package org.apache.spark which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling DataFrameReader.class

We believe this is due to this change in Spark 2.0.0:

The following features have been removed in Spark 2.0:

Bagel
Support for Hadoop 2.1 and earlier
The ability to configure closure serializer
HTTPBroadcast
TTL-based metadata cleaning
Semi-private class org.apache.spark.Logging. We suggest you use slf4j directly.

Is this a known issue? If so, are there any plans to fix it?

Thanks,

Bryce


(Jun Ohtani) #2

Now, we support Spark 2.0 with es-hadoop 5.0 alpha5.

See 5.0 alpha5 release blog. https://www.elastic.co/blog/es-hadoop-5-0-0-alpha5
And https://github.com/elastic/elasticsearch-hadoop/issues/647


(bryce ageno) #3

Thanks, I will try that.


(Jonathan Spooner) #4

Why is Spark 2.0 support rolled into the ES 5.0 release? We are upgrading our Spark from 1.6 to 2.0 right now. What do you suggest todo?


(James Baiera) #5

With Spark 2.0 not being backwards compatible with 1.6, and the fact that it is the new default spark version for ES-Hadoop, we decided to align the development with the 5.0 release. We felt that this breaking of binary compatibility should only be realized with a major version increase.

While I highly advise not using the beta release for a production deployment, I do suggest that you perform your testing with the beta to ensure a successful rollout when 5.0 eventually lands.


(Jonathan Spooner) #6

Hi James,

How is Spark not backwards compatible? The Spark 1.6 to 2.0 upgrade is fairly simple as you can see from the upgrading docs.

Right now I'm finishing up migrating our ETL jobs to Spark 2.0 and next I'll be updating the elasticsearch publishers that use es-hadoop. We have a 4TB elasticsearch cluster that we will not be upgrading to 5.0 this year.

What are my options for using Spark 2.0 with elasticsearch 2.4.0?


(James Baiera) #7

The biggest item that causes the binary incompatibility between 1.3-1.6 and 2.0 is the removal of the DataFrame and the addition of Dataset. DataFrame continues to exist for users, but is just a type alias for Dataset[Row]. This allows code written by users to continue to work with just a simple recompilation.

However, since ES-Hadoop is built by extending native Spark interfaces and must be distributed to other users, the compiled classes cannot reconcile the differences between 1.3-1.6 and 2.0 at runtime. Thus we have two separate distributions for the versions in 5.0.

It is also our policy to keep the default versions used in the main ES-Hadoop jar in lock step with the latest version of the technology that we support. Because of this, support for Spark 1.3-1.6 has been added as a separate artifact for backwards compatibility in 5.0.

When ES-Hadoop v5.0 is released, it will support both Spark 2.0 and Elasticsearch 2.4.0. I am sorry to say that earlier versions of ES-Hadoop will not have support for Spark 2.0 due to the disruptiveness of the version change on the rest of the project. We have decided to reserve this breaking change for a major version release in keeping with the semantic versioning principles.


(new to Es) #8

I'm sorry to disturb you.

As you mentioned above, I just wonder whether the ES2.3 can work with spark2.0.2 now.

Recently I tried to work on this but failed.

I have alreadly tried es-hadoop-2.3/2.4/5.0/5.1, all of them doesn't work well with es2.3 and spark2.0.2.

Is there any solution I can solve the problem?
@james.baiera


(system) #9