Spark ElasticSearch


(Gaurav Prasad) #1

While Fetching the record from ElasticSearch using spark I m getting bellow error.

Caused by: org.apache.spark.util.TaskCompletionListenerException: ActionRequestValidationException[Validation Failed: 1: no scroll ids specified;] at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:90) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
The code read as Below :

val esConf = Map(ConfigurationOptions.ES_NODES -> "n01ssl101.aap.csaa.pri" ,ConfigurationOptions.ES_PORT-> "9200",ConfigurationOptions.ES_HTTP_TIMEOUT -> "5m")

sqlContext.read.format("es").options(esConf).load("contract/data").collect


(Costin Leau) #2

Looks like the read cannot be completed for some reason. Can you provide more information about your setup and what library you are using?
See this page for more information.


(Gaurav Prasad) #3

Hi Costin,

Thanks for the response. ES Jar details is "org.elasticsearch" % "elasticsearch-spark_2.10" % "2.2.0", .Here is the code line :

val esConf = Map(ConfigurationOptions.ES_NODES -> "n01ssl101.aap.csaa.pri" ,ConfigurationOptions.ES_PORT-> "9200",ConfigurationOptions.ES_HTTP_TIMEOUT -> "5m") // var esContractRDD = EsSpark.esJsonRDD(sc,"contract/data",esConf) val sqlContext = new SQLContext(sc) val df = sqlContext.read.format("es").options(esConf).load("contract/data")

And is the sbt file details which talks about the version.

`name := "UpadteMembership"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.5.2",
"org.apache.hadoop" % "hadoop-common" % "2.7.1" excludeAll ExclusionRule(organization = "javax.servlet"),
"org.apache.spark" % "spark-sql_2.10" % "1.5.2",
"org.apache.spark" % "spark-hive_2.10" % "1.5.2",
"org.apache.spark" % "spark-yarn_2.10" % "1.5.2",
"org.elasticsearch" % "elasticsearch-spark_2.10" % "2.2.0",
"com.databricks" % "spark-csv_2.10" % "1.3.0",
"com.databricks" % "spark-xml_2.10" % "0.3.2",
"log4j" % "log4j" % "1.2.17",
"org.apache.spark" % "spark-streaming_2.10" % "1.5.2"
)

resolvers ++= Seq(
"Hortonworks Repository" at "http://repo.hortonworks.com/content/repositories/releases/",
"JBoss Repository" at "http://repository.jboss.org/nexus/content/repositories/releases/",
"Spray Repository" at "http://repo.spray.cc/",
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
"Akka Repository" at "http://repo.akka.io/releases/",
"Twitter4J Repository" at "http://twitter4j.org/maven2/",
"Apache HBase" at "https://repository.apache.org/content/repositories/releases",
"Twitter Maven Repo" at "http://maven.twttr.com/",
"scala-tools" at "https://oss.sonatype.org/content/groups/scala-tools",
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/",
"Second Typesafe repo" at "http://repo.typesafe.com/typesafe/maven-releases/",
"Mesosphere Public Repository" at "http://downloads.mesosphere.io/maven",
Resolver.sonatypeRepo("public")
)`


(Costin Leau) #4

Looks like you are mixing Scala 2.10 and 2.11. Make sure to use only one.


(system) #5