Is there a guide from creating uber jar with spark core and elasticsearch-spark?

HeartSaVioR · July 13, 2015, 9:44am

Hi!

I'm using Spark 1.4.0, and just start experimenting elasticsearch-hadoop.
I don't want to let Spark adding some libraries, so I made a uber jar whenever I made a new drivers.

I added "org.elasticsearch" %% "elasticsearch-spark" % "2.1.0" to build.sbt, and ran "sbt assembly", and met issue from deduplication.

java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/heartsavior/.ivy2/cache/com.esotericsoftware.kryo/kryo/bundles/kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class
/Users/heartsavior/.ivy2/cache/com.esotericsoftware.minlog/minlog/jars/minlog-1.2.jar:com/esotericsoftware/minlog/Log$Logger.class

I excluded spark-core from elasticsearch-spark with no luck.

So, I'd like to know about best practice to exclude libraries so that I can maintain uber jar which contains elasticsearch-spark.

Thanks in advance!

angel_ft · July 13, 2015, 12:09pm

Hi,

I had the same problem and ended just adding elasticsearch .jar as an "unmanaged" dependency (just placing it on the /lib/ folder of my project). Hope that works for you too.

costin · July 13, 2015, 4:16pm

Btw, elasticsearch spark is available as a Spark package so when using
Spark 1.2, you can simply specify it from the command line when submitting
your job.

HeartSaVioR · July 13, 2015, 8:59pm

Does it mean that I can let elasticsearch-spark as "provided" and just link it when submitting?

HeartSaVioR · July 14, 2015, 4:06am

I changed dependency "elasticsearch-spark" as "provided", and specified elasticsearch-spark package when submitting, and finally it works!
Thanks!

costin · July 14, 2015, 5:30am

Glad to hear it. Can you please post sbt setup as post/git/gist ? it might be useful to others trying to do the same thing in the future.

Thanks,

HeartSaVioR · July 14, 2015, 8:33am

Sure! Here it is.

gist.github.com

https://gist.github.com/HeartSaVioR/eb60eca01782c2c2b0c6

build.sbt

import AssemblyKeys._

name := "elasticsearch-spark-project"

version := "1.0-SNAPSHOT"

scalaVersion := "2.10.5"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.4.0" % "provided",

This file has been truncated. show original

Topic		Replies	Views
What libraries are needed for spark? Elasticsearch es-hadoop	2	990	July 6, 2017
Exception when using Elasticsearch-spark and Elasticsearch-core together Elasticsearch es-hadoop	5	3626	July 6, 2017
Elasticsearch spark runtime dependencies Elasticsearch es-hadoop , runtime-fields	2	259	November 1, 2023
Unable to find prebuilt jar for Spark 1.6 Elasticsearch es-hadoop	2	668	January 30, 2019
org.elasticsearch.spark.rdd.java.api.JavaEsSpark exists? Elasticsearch es-hadoop	4	869	July 6, 2017

Is there a guide from creating uber jar with spark core and elasticsearch-spark?

Related topics