Is there a guide from creating uber jar with spark core and elasticsearch-spark?


(Jungtaek Lim) #1

Hi!

I'm using Spark 1.4.0, and just start experimenting elasticsearch-hadoop.
I don't want to let Spark adding some libraries, so I made a uber jar whenever I made a new drivers.

I added "org.elasticsearch" %% "elasticsearch-spark" % "2.1.0" to build.sbt, and ran "sbt assembly", and met issue from deduplication.

java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/heartsavior/.ivy2/cache/com.esotericsoftware.kryo/kryo/bundles/kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class
/Users/heartsavior/.ivy2/cache/com.esotericsoftware.minlog/minlog/jars/minlog-1.2.jar:com/esotericsoftware/minlog/Log$Logger.class

I excluded spark-core from elasticsearch-spark with no luck.

So, I'd like to know about best practice to exclude libraries so that I can maintain uber jar which contains elasticsearch-spark.

Thanks in advance!


(Angel Faus) #2

Hi,

I had the same problem and ended just adding elasticsearch .jar as an "unmanaged" dependency (just placing it on the /lib/ folder of my project). Hope that works for you too.


Errors when answering by email
(Costin Leau) #3

Btw, elasticsearch spark is available as a Spark package so when using
Spark 1.2, you can simply specify it from the command line when submitting
your job.


(Jungtaek Lim) #4

Does it mean that I can let elasticsearch-spark as "provided" and just link it when submitting?


(Jungtaek Lim) #5

I changed dependency "elasticsearch-spark" as "provided", and specified elasticsearch-spark package when submitting, and finally it works!
Thanks!


(Costin Leau) #6

Glad to hear it. Can you please post sbt setup as post/git/gist ? it might be useful to others trying to do the same thing in the future.

Thanks,


(Jungtaek Lim) #7

Sure! Here it is.


(system) #8