An unshaded joda-time classes in elasticsearch-2.1.1.jar


#1

It looks like there is an unshaded joda-time classes in elasticsearch-2.1.1.jar.
The following is my build.sbt.

name := "MyApp"
version := "1.0"
scalaVersion := "2.10.6"
val sparkVers = "1.5.1"

resolvers ++= Seq(
  "Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/",
  "Akka Repository" at "http://repo.akka.io/releases/",
  "Sonatype OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots")

// Base Spark-provided dependencies
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVers % "provided"

// Elasticsearch integration
libraryDependencies ++= Seq(
  ("org.elasticsearch" % "elasticsearch-spark_2.10" % "2.1.2").
    exclude("org.apache.hadoop", "hadoop-yarn-api").
    exclude("org.eclipse.jetty.orbit", "javax.mail.glassfish").
    exclude("org.eclipse.jetty.orbit", "javax.servlet").
    exclude("org.slf4j", "slf4j-api").
    exclude("joda-time", "joda-time")
)

libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.1.1"

// Skip tests when assembling fat JAR
test in assembly := {}

// Exclude jars that conflict with Spark (see https://github.com/sbt/sbt-assembly)
libraryDependencies ~= { _ map {
  case m if Seq("org.elasticsearch").contains(m.organization) =>
    m.exclude("commons-logging", "commons-logging").
      exclude("commons-collections", "commons-collections").
      exclude("commons-beanutils", "commons-beanutils-core").
      exclude("com.esotericsoftware.minlog", "minlog").
      exclude("org.apache.commons", "commons-lang3").
      exclude("org.apache.spark", "spark-network-common_2.10") // This conflicts with guava
  case m => m
}}

dependencyOverrides += "org.scala-lang" % "scala-compiler" % scalaVersion.value
dependencyOverrides += "org.scala-lang" % "scala-library" % scalaVersion.value
dependencyOverrides += "commons-net" % "commons-net" % "3.1"

assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", "spark", "unused", "UnusedStubClass.class") => MergeStrategy.discard
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

And I am getting the following error by running sbt clean assembly

java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/DynamicScope/.ivy2/cache/org.elasticsearch/elasticsearch/jars/elasticsearch-2.1.1.jar:org/joda/time/base/BaseDateTime.class
/Users/DynamicScope/.ivy2/cache/joda-time/joda-time/jars/joda-time-2.8.2.jar:org/joda/time/base/BaseDateTime.class

Does anyone know how to solve this duplicate conflict?
I need to use val dateTime = new DateTime() for my application.

Thanks.

Updates

I solved the conflict issue by adding the following to assemblyMergeStrategy.

case PathList("org", "joda", "time", "base", "BaseDateTime.class") => MergeStrategy.first


(David Pilato) #2

You should read this: https://www.elastic.co/blog/to-shade-or-not-to-shade

Edit: forget it. Wrong answer.

Well, indeed, elasticsearch provides its own version of this class. It needs to be loaded first and ignored by any jar hell checker.


#3

Isn't it a bad practice to have an unshaded classes used in different jar because it is hard to track down?


(Petey Pab Pro) #4

Do you know why Elasticsearch provides its own version? If that is removed, and we just rely on the external joda dependency, will this break something? Similar question for the Lucene classes that are packaged in the jar..


(Jörg Prante) #5

The original Joda BaseDateTime class gives a 6x worse performance because of volatile, see this discussion

http://cs.oswego.edu/pipermail/concurrency-interest/2011-August/008112.html


(Petey Pab Pro) #6

Thanks, that's helpful. Do you know if this will affect client performance in any way? Or is it really only important for server side operations? In other words, if I run the Java client with the standard joda time library, do you think this will have a substantial impact on performance?


(Jörg Prante) #7

Joda date/time parsing is only relevant to field mapping processing at server side. Clients do not use Joda at all (unless you use custom Joda-dependent code for the client, of course).


(system) #8