Spark elasticsearch 5.0.2 scala.MatchError

markcitizen · December 7, 2016, 9:10pm

Hello,
I'm using Spark 2.0 with Spark Elasticsearch version 5.0.2 (Scala 2.11):
"org.elasticsearch" % "elasticsearch-spark-20_2.11" % "5.0.2"

When trying to read from an index I'm getting the following error:
scala.MatchError: Buffer() (of class scala.collection.convert.Wrappers$JListWrapper)

This is how I'm getting the data (session is an instance of SparkSession):
session.sqlContext.read.format("org.elasticsearch.spark.sql").options(opt).load(esIndexName)

opt is a Map of options:
Map("es.read.field.as.array.include" -> "fieldNames",
"es.input.json" -> "true",
"es.field.read.empty.as.null" -> "true",
"es.index.read.missing.as.empty" -> "true",
"es.read.field.exclude" -> excludedFields)

Field is question is not in an array, it's a nested object field:
a {
b {
c = "string value"
}
}

I found a similar issue here:

github.com/elastic/elasticsearch-hadoop

Spark-ES Schema problem

opened 08:11PM - 14 Jan 16 UTC

closed 11:15PM - 24 Jan 16 UTC

JoaquinSV

question :Spark v2.2.0

Hello, I'm using the connector with Spark, and I'm trying to read fields that h…as arrays and Strings, like this: Field_name ["value"] value when I set es.field.read.as.array.include = Field_name It thows me an error that some fields are not arrays, because some of them are String: Caused by: scala.MatchError: value (of class java.lang.String) And when I set es.field.read.as.array.exclude = Field_name It thows me an error that some fields are not Strings, because some of them are arrays: Caused by: scala.MatchError: Buffer(["value"]) (of class scala.collection.convert.Wrappers$JListWrapper) How can I solve this? My goal is to write indexes of ES into parquet or json files using Spark. Regards

But after upgrading to the latest spark-elasticsearch library version I'm still seeing the problem.
I would appreciate any suggestions on how to fix this.
Thanks,

M

Spark stack trace:

WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, host): scala.MatchError: Buffer() (of class scala.collection.convert.Wrappers$JListWrapper)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:296)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:295)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:261)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:261)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$anonfun$toCatalystImpl$2.apply(CatalystTypeConverters.scala:164)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
<<<

markcitizen · December 12, 2016, 6:31pm

Hello,
I'm going to answer my own question.
I was not able to find a solution to this problem. I found some answers online but they referred to older versions of spark-elasticsearch library, and they said that the problem was supposed to be fixed in the latest version.
Seeing how the latest version was still buggy I decided to skip the conversion process altogether and read ES index as JSON, and parse the data myself.

You can do that using code similar to this one (Scala):
val readCfg = Map("setting" -> "value")
val tuples = session.sparkContext.esJsonRDD("myEsIndex", readCfg)

"tuples" is a collection of Tuple2[String, String] items, where the first one is the entry index and the second one is the entry body (JSON text). You can parse JSON text using Playframework JSON library, for example.
I hope this helps,

M

system · January 9, 2017, 6:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scala.MatchError: when trying to read data from elasticsearch Elasticsearch es-hadoop	4	2723	July 6, 2017
SPARK-ES issue Elasticsearch es-hadoop	1	700	May 14, 2019
Field not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	9	6275	July 6, 2017
Handling array values while reading from elasticsearch in spark using elasticsearch-spark Elasticsearch es-hadoop	1	938	November 19, 2020
Best practise to read ES from PySpark Elasticsearch es-hadoop	5	7259	April 14, 2018

Spark elasticsearch 5.0.2 scala.MatchError

Related topics