Does the current library support spark 2.1.1 .
The code is breaking for me . Printschema works fine , but show and collect breaks.
Could you link the logs/stacktraces that you are seeing? Have you upgraded from a previous version or is this a fresh install?
James
I did fresh installation of Spark 2.1.1 and installed ELK 5.4 and EShadoop
5.4. Previuous version I had issues with reading child array into dataframe
. I can printschema but but select crashes.
Have upgrade to the lastest version of spark and elk i am unable to run
basic select from dataframe. please refer below stack.
Traceback (most recent call last):
File "/Users/anupamjaiswal/Documents/aj/pyspk.py", line 28, in
df.select("tags").show()
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
line 318, in show
print(self._jdf.showString(n, 20))
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py",
line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 63, in deco
return f(*a, **kw)
File
"/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py",
line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o37.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
0.0 (TID 0, localhost, executor driver): java.lang.NoClassDefFoundError:
scala/collection/GenTraversableOnce$class
at
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.(AbstractEsRDDIterator.scala:28)
at
org.elasticsearch.spark.sql.ScalaEsRowRDDIterator.(ScalaEsRowRDD.scala:49)
at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:45)
at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
Which version of Scala are you running Spark on top of and which specific version of ES-Hadoop are you using? Generally seeing java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
means that you are mismatching scala versions 2.10 and 2.11.
We are using pyspark for coding and have Scala version 2.11.8
Do we have any update ?
Sorry for disappearing there. This error almost always means that you are using the incorrect version of ES-Hadoop for your version of Scala. Do you have the artifact name for elasticsearch-spark that you are using? It should end in _2.11
My Bad was using wrong version . Do you have any working example of
exploding nested json stored in elastic , will appreciate your help.
I have the same problem, my Scala version and elasticsearch-spark are both in 2.11.
I can successfully insert data to elasticsearch, but I can't read data from elasticsearch.
(with Elastic Stack 5.4.0 and Spark 2.1.1)
My code:
val es_df = sqc.read.format("org.elasticsearch.spark.sql").load("test/daily_player_game")
My Maven Dependencies:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-hadoop -->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>5.3.1</version>
</dependency>
Error Message:
17/06/13 10:01:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.183, executor 2): java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.<init>(AbstractEsRDDIterator.scala:28) at org.elasticsearch.spark.sql.ScalaEsRowRDDIterator.<init>(ScalaEsRowRDD.scala:49) at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:45) at org.elasticsearch.spark.sql.ScalaEsRowRDD.compute(ScalaEsRowRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
@byakuinss, are you sure that there are no conflicting jars on your cluster?
Yes. There is only one node in my elk cluster and it's a new server which does not have any old version jars before.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.