How to fix guava version conflicts with Hadoop YARN classpath?


I've spent several hours trying to fix a problem with Elastic Search v2.2 Java lib and Cloudera Hadoop Cluster v5.5.1.

The problem:

Guava-18.0.jar is not being recognized by elastic search v2.2 when used by a Pig (v0.11.0) UDF. As a result, the stack trace below occurs.

I've read and implemented the To Shade or not To Shade article solution, however, I continue to get the exception error below.

How do I make use of guava-18.0.jar for elastic search v2.2 while other hadoop projects use older guava versions in the class path?

2016-03-24 17:52:17,450 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError:;
at org.elasticsearch.threadpool.ThreadPool.(
at org.elasticsearch.client.transport.TransportClient$
at com.nextbigsound.find.FindConfig.getElasticSearchClient(
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.(
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
at org.apache.hadoop.mapred.YarnChild$
at Method)
at org.apache.hadoop.mapred.YarnChild.main(

Having exactly the same issue.

Just solved it, apparently the hadoop version I was running the bulk loader had an earlier version of guava jar in the path.
I was able to solve it by setting
mapreduce.job.user.classpath.first to true

Point being, make sure there is no other guava jar in the path than you supply

The way I solved this issue is by using the es-hadoop connector by elastic search. It avoids the guava version issue entirely and, in my case, replaces a custom elastic search implementation.

I'll write a blog post and share how I solved the issue using Pig and es-hadoop.

How and where to set? Please tell in detail.

We also have the same issue any further details.

Issue can be fix by relocating guava jar by maven and use guava 18.0 jar mention on pom.xml file.

We are facing same issue....Can you provide more details on how to set . Here is my build.sbt

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.elasticsearch" % "elasticsearch-hadoop" % "5.0.0-alpha4",
"org.elasticsearch" % "elasticsearch" % "2.3.4",
"org.elasticsearch.plugin" % "shield" % "2.3.4" from "",
"joda-time" % "joda-time" % "2.7",
"com.databricks" %% "spark-xml" % "0.3.3",
"com.sun.jersey" % "jersey-servlet" % "1.19",
"" % "guava" % "18.0",
"com.amazonaws" % "aws-java-sdk" % "1.11.26",
"com.typesafe" % "config" % "1.3.0",
"com.databricks" %% "spark-csv" % "1.4.0",
"org.apache.spark" %% "spark-mllib" % sparkVersion


resolvers ++= Seq(
"Akka Repository" at "",
"scala-tools" at ""
//"elasticsearch-releases" at ""

Please refer to this link here to relocate the jar.