Zeppelin and Elasticsearch-Spark incompatibility

dona · June 10, 2017, 10:13pm

I am running a Zeppelin 0.7.1 instance and I want to use the Spark connector. I added the dependency

org.elasticsearch:elasticsearch-spark-20_2.10:5.4.1

And yet I get the following error message everytime I run any code on the spark interpreter (even if it is a random import statement or a plain sc.version):

java.lang.NullPointerException
	at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
	at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
	at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:391)
	at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:380)
	at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
	at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

This is strictly related to this dependency. If I remove the dependency the interpreter works flawlessly. I also tested it for scala version 11 (so -20_2.11). The same error happens, so it is not a scala incompatibility.

I also added the following entries in the interpreter:

spark.es.nodes	localhost
spark.es.port	9200
spark.es.index.auto.create	true

Is there some other configuration that I haven't added yet? Any idea why this NullPointerException happens?

Edit: Some clarifications. When I run a local version of Zeppelin and a standalone Spark cluster in my localmachine, I do not get any errors. I can use the Elastic dependency. However, I need to use another Zeppelin that is hosted on another machine. Only through that Zeppelin can I make use of a big proper spark cluster for running heavy jobs. Yet the same settings do not work for the big cluster and I get the error above.

Edit 2: I finally found out where the incompatibility lies: The elasticsearch-spark connector clashes for some reason with my play json dependency:

com.typesafe.play:play-json_2.11:2.5.15

If I get rid of this dependency (which I need actually) then the es-spark connection works fine. However I cannot get any further. I don't know how to find out which packets/dependencies of play.json and es-spark are actually clashing and causing this.

system · July 8, 2017, 10:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark 2.0.2 read data from ES 1.7.0 using spark-20_2.11:5.0.1 got error java.io.InvalidClassException: org.apache.spark.sql.execution.FilterExec Elasticsearch es-hadoop	1	882	December 28, 2016
Elastic Search Hadoop Connector - Spark Facing Issues while Saving to ES Elasticsearch es-hadoop	4	1822	July 6, 2017
ES 5.3 and Java-API 5.3.0 > java.lang.NullPointerException? Elasticsearch	17	4144	June 10, 2017
Elasticsearch dependency on zeppelin and emr breaks the ability to read parquet files Elasticsearch es-hadoop	1	1206	December 20, 2016
Exception when using Elasticsearch-spark and Elasticsearch-core together Elasticsearch es-hadoop	5	3624	July 6, 2017

Zeppelin and Elasticsearch-Spark incompatibility

Related topics