SparkSQL to Elasticsearch compatibility problem

Hi,
I have one cluster with hadoop (cloudera 5.8) and another separate cluster with ES 5.2.2.
I want yo use Spark to write data from Hive to ES, but i have problems with Java version.

Java version: 1.7
ES version: 5.2.2
Spark: 1.6
Scala: 2.10

On Hadoop cluster i have java 1.7. On my POM file i use "elasticsearch-spark_2.10" connector version 2.2.1.

When i use elasticsearch-spark_2.10 version 2.2.1 i obtain the error:

2017-03-16 13:04:02 DEBUG EsDataFrameWriter:180 - Discovered Elasticsearch version [5.2.2]
2017-03-16 13:04:02 DEBUG HttpMethodBase:1024 - Resorting to protocol version default close connection policy
2017-03-16 13:04:02 DEBUG HttpMethodBase:1028 - Should NOT close connection, using HTTP/1.1
2017-03-16 13:04:02 DEBUG HttpConnection:1178 - Releasing connection back to connection manager.
2017-03-16 13:04:02 ERROR Executor:95 - Exception in task 1.1 in stage 5.0 (TID 1254)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens when accessing a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:190)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:55)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:55)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version 5.2.2
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:185)
... 10 more

If i change my POM with elasticsearch-spark_2.10 version 5.0.0-alpha4 which is the version compatible with ES 5.2.2 i get another error:

ApplicationMaster:95 - User class threw exception: java.lang.UnsupportedClassVersionError: org/elasticsearch/spark/rdd/CompatUtils : Unsupported major.minor version 52.0
java.lang.UnsupportedClassVersionError: org/elasticsearch/spark/rdd/CompatUtils : Unsupported major.minor version 52.0

I think is because elasticsearch-spark_2.10 (both versions?) are compiled with java 1.8 and my environment is java 1.7? If is this, is there a way to re-compile the elasticsearch-spark_2.10 to Java 1.7?

Thanks.

@Eduardo_Curonisy In the first case you provided, you are using version 2.2.1 of ES-Hadoop which can only interact with earlier versions of Elasticsearch (2.2 and below). So you are correct in that you need to upgrade to a newer version of the connector.

I would avoid using the *-alpha releases of the connector at this point. ES-Hadoop is released in lock step with Elasticsearch now, so version 5.2.2 is already out and will be the most compatible with your version of Elasticsearch. Generally, it's best to keep ES-Hadoop at the same version or higher (we support backwards compatibility).

It was also discovered early in the 5.0 release cycle that a change in the build process meant that Scala classes in the Spark package were being compiled for Java 8 instead of in compatibility mode for Java 6. This was fixed and should be correct in version 5.2.2.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.