Scala SBT install issues


(Jonathan Spooner) #1

Has anyone had issues adding the elasticsearch-hadoop v 5.0.0 lib to SBT?

name := "Insights Data"
version := "1.0.0"
organization := "me"
scalaVersion := "2.11.5"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.1" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.0.1" % "provided",
  "com.databricks" % "spark-csv_2.11" % "1.5.0" % "provided",
  "mysql" % "mysql-connector-java" % "5.1.40" % "provided",
  "org.elasticsearch" % "elasticsearch-hadoop" % "5.0.0" % "provided"
)
resolvers += Resolver.mavenLocal


sbt clean package    
[info] Set current project to Insights Data (in build file:/Users/jspooner/v/insights_spark/)
[success] Total time: 0 s, completed Nov 4, 2016 5:03:27 PM
[info] 	[SUCCESSFUL ] org.apache.spark#spark-catalyst_2.11;2.0.1!spark-catalyst_2.11.jar (2804ms)
[info] Done updating.
[error] Modules were resolved with conflicting cross-version suffixes in {file:/Users/jspooner/v/insights_spark/}insights_spark:
[error]    org.apache.spark:spark-launcher _2.11, _2.10
[error]    org.apache.spark:spark-sketch _2.11, _2.10
[error]    org.json4s:json4s-ast _2.11, _2.10
[error]    org.apache.spark:spark-catalyst _2.11, _2.10
[error]    org.apache.spark:spark-network-shuffle _2.11, _2.10
[error]    org.scalatest:scalatest _2.11, _2.10
[error]    com.twitter:chill _2.11, _2.10
[error]    org.apache.spark:spark-sql _2.11, _2.10
[error]    org.json4s:json4s-jackson _2.11, _2.10
[error]    com.fasterxml.jackson.module:jackson-module-scala _2.11, _2.10
[error]    org.json4s:json4s-core _2.11, _2.10
[error]    org.apache.spark:spark-unsafe _2.11, _2.10
[error]    org.apache.spark:spark-tags _2.11, _2.10
[error]    org.apache.spark:spark-core _2.11, _2.10
[error]    org.apache.spark:spark-network-common _2.11, _2.10
java.lang.RuntimeException: Conflicting cross-version suffixes in: org.apache.spark:spark-launcher, org.apache.spark:spark-sketch, org.json4s:json4s-ast, org.apache.spark:spark-catalyst, org.apache.spark:spark-network-shuffle, org.scalatest:scalatest, com.twitter:chill, org.apache.spark:spark-sql, org.json4s:json4s-jackson, com.fasterxml.jackson.module:jackson-module-scala, org.json4s:json4s-core, org.apache.spark:spark-unsafe, org.apache.spark:spark-tags, org.apache.spark:spark-core, org.apache.spark:spark-network-common
	at scala.sys.package$.error(package.scala:27)
	at sbt.ConflictWarning$.processCrossVersioned(ConflictWarning.scala:46)
	at sbt.ConflictWarning$.apply(ConflictWarning.scala:32)
	at sbt.Classpaths$$anonfun$66.apply(Defaults.scala:1164)
	at sbt.Classpaths$$anonfun$66.apply(Defaults.scala:1161)
	at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
	at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
	at sbt.std.Transform$$anon$4.work(System.scala:63)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
	at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
	at sbt.Execute.work(Execute.scala:235)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
	at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
	at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[error] (*:update) Conflicting cross-version suffixes in: org.apache.spark:spark-launcher, org.apache.spark:spark-sketch, org.json4s:json4s-ast, org.apache.spark:spark-catalyst, org.apache.spark:spark-network-shuffle, org.scalatest:scalatest, com.twitter:chill, org.apache.spark:spark-sql, org.json4s:json4s-jackson, com.fasterxml.jackson.module:jackson-module-scala, org.json4s:json4s-core, org.apache.spark:spark-unsafe, org.apache.spark:spark-tags, org.apache.spark:spark-core, org.apache.spark:spark-network-common
[error] Total time: 28 s, completed Nov 4, 2016 5:03:54 PM

(Jonathan Spooner) #2

It appears this cross-version error is due to elasticsearch-hadoop only being compiled with Scala 2.10. I read that this is because Spark is on v 2.10, however the latest EMR 5.0.3 release is using Scala 2.11.

I think my only option is to build my jar with Scala version 2.10 until elasticsearch-hadoop is released with Scala 2.11 support?

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
      /_/
         
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)

(Jonathan Spooner) #3

I rolled my development box back to Scala 2.10

brew install scala210
brew link scala210 --force

and the application compiles fine but when run on EMR 5.0.1 I get this error.

16/11/07 14:15:45 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
	at com.verve.insights.sightings.StepThree$.main(StepThree.scala:79)
	at com.verve.insights.sightings.StepThree.main(StepThree.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
16/11/07 14:15:45 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: 

I found a few SO post on this issue so I upgraded my SBT to the latest 0.13.13 but the issues remain.

Note that everything was working fine in spark-shell on EMR 5.0.3


(Jonathan Spooner) #4

I've also tried downloading the JARs directly and passing them directly spark-submit --jars /home/hadoop/elasticsearch-spark-20_2.10-5.0.0.jar but this yields the same java.lang.NoSuchMethodError.

I've also tried using spark-submit --package without success.

I noticed that the downloaded JAR's do have a version for Scala 2.11. I'm curious why and why this version is not in Maven so it can be available as a package?


(Jonathan Spooner) #5

Since is not available on Maven I was thinking I could add it to my local directory, however the downloads do not contain the needed pom file.

[warn] ==== Maven2 Local: tried
[warn]   file:/Users/jspooner/.m2/repository/org/elasticsearch/elasticsearch-hadoop_2.11/5.0.0/elasticsearch-hadoop_2.11-5.0.0.pom

(Jonathan Spooner) #6

I'm currently able to build my project with Scala 2.10 however when I deploy my JAR to EMR emr-5.1.0 which uses Scala 2.11.8 my dataframe is not responding to a map call.

scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.1" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.0.1" % "provided"
)
libraryDependencies += "org.elasticsearch" % "elasticsearch-hadoop" % "5.0.0"

Error

16/11/07 19:55:12 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
	at com.verve.insights.sightings.StepThree$.main(StepThree.scala:82)

On line 82 I have a map call that I believe is undefined.

val myDS = myDF.map {
      case Row(device_id: String) => Device(device_id)
    }

(Jonathan Spooner) #7

After a few days of attempting to upgrade to the 5.0 release of elasticsearch-hadoop I'm afraid I have to abort the upgrade. If you guys have any updates with Scala and SBT please let me know.


(James Baiera) #8

We release a version of just the es-spark library that is compiled against scala 2.11. Should be in maven as org.elasticsearch:elasticsearch-spark-20_2.11:5.0.0. I'm able to see it loaded on MvnRepository here: https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-20_2.11

Hope this helps.


(James Baiera) #9

We also have a jar that supports Spark versions 1.3-1.6 compiled with Scala 2.11 in the form of org.elasticsearch:elasticsearch-spark-13_2.11:5.0.0, for other readers that may have this problem.


(Jonathan Spooner) #10

@james.baiera Thanks for the response. I verified my jar runs locally with Spark 2.0.1 and Scala 2.11. I also verified my job runs on EMR when I comment out EsSpark.saveToEs. I'm able to print out my data frames without any error. I also verified my EMR cluster has access to my elasticsearch cluster. However when I add the EsSpark.saveToEs call in my job I get the following error.

Debug code

deviceDS.show()
println(s"$esDeviceIndex/device")
EsSpark.saveToEs(deviceDS.rdd, s"$esDeviceIndex/device", esConfig)

build.sbt

scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.1" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.0.1" % "provided"
)
libraryDependencies += "org.elasticsearch" % "elasticsearch-spark-20_2.11" % "5.0.0" % "provided"

Run job

spark-submit
 --packages org.elasticsearch:elasticsearch-spark-20_2.10:5.0.0 
 --class com.verve.insights.sightings.StepThree 
 s3://mypath/insights-data_2.11-1.0.2.jar

Error from emr-5.1.0

16/11/11 15:05:13 WARN TaskSetManager: Lost task 31.0 in stage 4.0 (TID 238, ip-10-2-21-230.aws.vrv): java.lang.NoSuchMethodError: 
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
	at org.elasticsearch.spark.serialization.ReflectionUtils$.org$elasticsearch$spark$serialization$ReflectionUtils$$checkCaseClass(ReflectionUtils.scala:42)
	at org.elasticsearch.spark.serialization.ReflectionUtils$$anonfun$checkCaseClassCache$1.apply(ReflectionUtils.scala:84)
	at org.elasticsearch.spark.serialization.ReflectionUtils$$anonfun$checkCaseClassCache$1.apply(ReflectionUtils.scala:83)
	at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
	at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
	at org.elasticsearch.spark.serialization.ReflectionUtils$.checkCaseClassCache(ReflectionUtils.scala:83)
	at org.elasticsearch.spark.serialization.ReflectionUtils$.isCaseClass(ReflectionUtils.scala:102)
	at org.elasticsearch.spark.serialization.ScalaMapFieldExtractor$$anonfun$extractField$1.apply$mcVI$sp(ScalaMapFieldExtractor.scala:37)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
	at org.elasticsearch.spark.serialization.ScalaMapFieldExtractor.extractField(ScalaMapFieldExtractor.scala:33)
	at org.elasticsearch.hadoop.serialization.field.ConstantFieldExtractor.field(ConstantFieldExtractor.java:36)
	at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory$FieldWriter.write(AbstractBulkFactory.java:94)
	at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.writeTemplate(TemplatedBulk.java:80)
	at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:56)
	at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:159)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
	at org.apache.spark.scheduler.Task.run(Task.scala:86)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

(Jonathan Spooner) #11

@james.baiera forget that last question I was using the 2.10 version in the spark-submit --packages.


(system) #12