Spark ElasticSearch

gauravprasad · May 9, 2016, 10:45pm

While Fetching the record from ElasticSearch using spark I m getting bellow error.

Caused by: org.apache.spark.util.TaskCompletionListenerException: ActionRequestValidationException[Validation Failed: 1: no scroll ids specified;] at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:90) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
The code read as Below :

val esConf = Map(ConfigurationOptions.ES_NODES -> "n01ssl101.aap.csaa.pri" ,ConfigurationOptions.ES_PORT-> "9200",ConfigurationOptions.ES_HTTP_TIMEOUT -> "5m")

sqlContext.read.format("es").options(esConf).load("contract/data").collect

costin · May 10, 2016, 9:06am

Looks like the read cannot be completed for some reason. Can you provide more information about your setup and what library you are using?
See this page for more information.

gauravprasad · May 10, 2016, 6:21pm

Hi Costin,

Thanks for the response. ES Jar details is "org.elasticsearch" % "elasticsearch-spark_2.10" % "2.2.0", .Here is the code line :

val esConf = Map(ConfigurationOptions.ES_NODES -> "n01ssl101.aap.csaa.pri" ,ConfigurationOptions.ES_PORT-> "9200",ConfigurationOptions.ES_HTTP_TIMEOUT -> "5m") // var esContractRDD = EsSpark.esJsonRDD(sc,"contract/data",esConf) val sqlContext = new SQLContext(sc) val df = sqlContext.read.format("es").options(esConf).load("contract/data")

And is the sbt file details which talks about the version.

`name := "UpadteMembership"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.5.2",
"org.apache.hadoop" % "hadoop-common" % "2.7.1" excludeAll ExclusionRule(organization = "javax.servlet"),
"org.apache.spark" % "spark-sql_2.10" % "1.5.2",
"org.apache.spark" % "spark-hive_2.10" % "1.5.2",
"org.apache.spark" % "spark-yarn_2.10" % "1.5.2",
"org.elasticsearch" % "elasticsearch-spark_2.10" % "2.2.0",
"com.databricks" % "spark-csv_2.10" % "1.3.0",
"com.databricks" % "spark-xml_2.10" % "0.3.2",
"log4j" % "log4j" % "1.2.17",
"org.apache.spark" % "spark-streaming_2.10" % "1.5.2"
)

resolvers ++= Seq(
"Hortonworks Repository" at "http://repo.hortonworks.com/content/repositories/releases/",
"JBoss Repository" at "http://repository.jboss.org/nexus/content/repositories/releases/",
"Spray Repository" at "http://repo.spray.cc/",
"Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
"Akka Repository" at "http://repo.akka.io/releases/",
"Twitter4J Repository" at "http://twitter4j.org/maven2/",
"Apache HBase" at "https://repository.apache.org/content/repositories/releases",
"Twitter Maven Repo" at "http://maven.twttr.com/",
"scala-tools" at "https://oss.sonatype.org/content/groups/scala-tools",
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/",
"Second Typesafe repo" at "http://repo.typesafe.com/typesafe/maven-releases/",
"Mesosphere Public Repository" at "http://downloads.mesosphere.io/maven",
Resolver.sonatypeRepo("public")
)`

costin · May 18, 2016, 6:46am

Looks like you are mixing Scala 2.10 and 2.11. Make sure to use only one.

Topic		Replies	Views
Spark task are failing: No search context found for id Elasticsearch es-hadoop	1	974	February 3, 2019
EsHadoopInvalidRequest: Malformed scrollId caused by es.scroll.limit Elasticsearch es-hadoop	9	1695	June 16, 2017
EsHadoopInvalidRequest after updating to Elasticsearch 5.5 Elasticsearch es-hadoop	3	2249	August 17, 2017
Empty strings not being allowed for insertion into ES from Spark Elasticsearch es-hadoop	1	1004	July 6, 2017
Can you help to check this error please? Elasticsearch es-hadoop	4	1459	July 6, 2017

Spark ElasticSearch

Related topics