ElasticSearch spark yarn -hadoop classpath


(Saurabh Malviya) #1

Hi,

We are using spark on yarn and using es-hadoop , AWS-EMR-4.8.2. Issue seems like ES-Hadoop is using hadoop 2.2 and EMR is using 2.7.3, We are getting below exception, After drill down it found es-hadoop use hadoop -2.2 . Due to this ramdonly job failed on EMR. It seems order of classpath causing the problem. Is anyone faces the same problem. Let me know the sample of sbt file to exlcude only hadoop.

This is what i am trying.

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"com.google.guava" % "guava" % "18.0",
("org.elasticsearch" % "elasticsearch-hadoop" % "5.0.0-alpha4").
exclude("org.apache.hadoop", "hadoop-yarn-api").
exclude("org.eclipse.jetty.orbit", "javax.mail.glassfish").
exclude("org.eclipse.jetty.orbit", "javax.servlet").
exclude("org.slf4j", "slf4j-api") intransitive(),
//excludeAll ExclusionRule(organization = "org.apache.hadoop"),
"org.elasticsearch" % "elasticsearch" % "2.3.4",
"joda-time" % "joda-time" % "2.7",
"com.databricks" %% "spark-xml" % "0.3.3",

https://issues.apache.org/jira/browse/OOZIE-2389


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.