Elastic Stack 5.4.0 is breaking for Spark 2.1.1

Does the current library support spark 2.1.1 .
The code is breaking for me . Printschema works fine , but show and collect breaks.

Could you link the logs/stacktraces that you are seeing? Have you upgraded from a previous version or is this a fresh install?

James

I did fresh installation of Spark 2.1.1 and installed ELK 5.4 and EShadoop
5.4. Previuous version I had issues with reading child array into dataframe
. I can printschema but but select crashes.

Have upgrade to the lastest version of spark and elk i am unable to run
basic select from dataframe. please refer below stack.
Traceback (most recent call last):
File "/Users/anupamjaiswal/Documents/aj/pyspk.py", line 28, in
df.select("tags").show()
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
line 318, in show
print(self._jdf.showString(n, 20))
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py",
line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 63, in deco
return f(*a, **kw)
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py",
line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o37.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError:
scala/collection/GenTraversableOnce$class
at
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.(AbstractEsRDDIterator.scala:28)
at
org.elasticsearch.spark.sql.ScalaEsRowRDDIterator.(ScalaEsRowRDD.scala:49)
at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:45)
at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

Which version of Scala are you running Spark on top of and which specific version of ES-Hadoop are you using? Generally seeing java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class means that you are mismatching scala versions 2.10 and 2.11.

We are using pyspark for coding and have Scala version 2.11.8

Do we have any update ?

Sorry for disappearing there. This error almost always means that you are using the incorrect version of ES-Hadoop for your version of Scala. Do you have the artifact name for elasticsearch-spark that you are using? It should end in _2.11

My Bad was using wrong version . Do you have any working example of
exploding nested json stored in elastic , will appreciate your help.

I have the same problem, my Scala version and elasticsearch-spark are both in 2.11.
I can successfully insert data to elasticsearch, but I can't read data from elasticsearch.
(with Elastic Stack 5.4.0 and Spark 2.1.1)

My code:

val es_df = sqc.read.format("org.elasticsearch.spark.sql").load("test/daily_player_game")

My Maven Dependencies:

		<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.11</artifactId>
			<version>2.1.1</version>
		</dependency>

		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming_2.11</artifactId>
			<version>2.1.1</version>
		</dependency>

		<!-- https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-hadoop -->
		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch-spark-20_2.11</artifactId>
			<version>5.3.1</version>
		</dependency>

Error Message:

17/06/13 10:01:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.183, executor 2): java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
	at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.<init>(AbstractEsRDDIterator.scala:28)
	at org.elasticsearch.spark.sql.ScalaEsRowRDDIterator.<init>(ScalaEsRowRDD.scala:49)
	at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:45)
	at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

@byakuinss, are you sure that there are no conflicting jars on your cluster?

Yes. There is only one node in my elk cluster and it's a new server which does not have any old version jars before.

Thank you infomation.
Very good.


Sbobet