Is there a guide from creating uber jar with spark core and elasticsearch-spark?

(Jungtaek Lim) #1


I'm using Spark 1.4.0, and just start experimenting elasticsearch-hadoop.
I don't want to let Spark adding some libraries, so I made a uber jar whenever I made a new drivers.

I added "org.elasticsearch" %% "elasticsearch-spark" % "2.1.0" to build.sbt, and ran "sbt assembly", and met issue from deduplication.

java.lang.RuntimeException: deduplicate: different file contents found in the following:

I excluded spark-core from elasticsearch-spark with no luck.

So, I'd like to know about best practice to exclude libraries so that I can maintain uber jar which contains elasticsearch-spark.

Thanks in advance!

(Angel Faus) #2


I had the same problem and ended just adding elasticsearch .jar as an "unmanaged" dependency (just placing it on the /lib/ folder of my project). Hope that works for you too.

Errors when answering by email
(Costin Leau) #3

Btw, elasticsearch spark is available as a Spark package so when using
Spark 1.2, you can simply specify it from the command line when submitting
your job.

(Jungtaek Lim) #4

Does it mean that I can let elasticsearch-spark as "provided" and just link it when submitting?

(Jungtaek Lim) #5

I changed dependency "elasticsearch-spark" as "provided", and specified elasticsearch-spark package when submitting, and finally it works!

(Costin Leau) #6

Glad to hear it. Can you please post sbt setup as post/git/gist ? it might be useful to others trying to do the same thing in the future.


(Jungtaek Lim) #7

Sure! Here it is.

(system) #8