I'm using Spark 1.4.0, and just start experimenting elasticsearch-hadoop.
I don't want to let Spark adding some libraries, so I made a uber jar whenever I made a new drivers.
I added "org.elasticsearch" %% "elasticsearch-spark" % "2.1.0" to build.sbt, and ran "sbt assembly", and met issue from deduplication.
java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/heartsavior/.ivy2/cache/com.esotericsoftware.kryo/kryo/bundles/kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class
/Users/heartsavior/.ivy2/cache/com.esotericsoftware.minlog/minlog/jars/minlog-1.2.jar:com/esotericsoftware/minlog/Log$Logger.class
I excluded spark-core from elasticsearch-spark with no luck.
So, I'd like to know about best practice to exclude libraries so that I can maintain uber jar which contains elasticsearch-spark.
I had the same problem and ended just adding elasticsearch .jar as an "unmanaged" dependency (just placing it on the /lib/ folder of my project). Hope that works for you too.
Btw, elasticsearch spark is available as a Spark package so when using
Spark 1.2, you can simply specify it from the command line when submitting
your job.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.