Elasticsearch-spark: empty fields are returned as "None" when using EsSpark.esRdd()

lists_arun · August 24, 2015, 5:29am

I am using the latest build of elasticsearch-spark. (2.2.0.BUILD-20150818.024341-55) with Scala 2.11. When I process results from EsSpark.esRdd(), I find that empty columns (or fields) are returned as the string "None" instead of as the empty string. This seems like a bug. Or is there some setting to ensure that I get the empty string and not "None".

Thanks,
arun

lists_arun · August 25, 2015, 7:22pm

I found the solution. I needed to set "es.field.read.empty.as.null" to "no" (the default value is "yes").

  val esRdd: RDD[(String, Map[String, AnyRef])] =
    EsSpark.esRDD(sc, esIndexType, Map(
      "es.nodes" -> esHostPort,
      "es.field.read.empty.as.null" -> "no"))

costin · August 28, 2015, 9:57am

Glad to see you sorted things out.

Topic		Replies	Views
ElasticSearch+Hadoop+Spark Elasticsearch	2	978	July 6, 2017
Pyspark write to elasticsearch with empty fields Elasticsearch	1	1244	February 6, 2020
Value got nulled when ingesting to ES from Hadoop using Spark Elasticsearch es-hadoop	1	404	January 6, 2021
Elasticsearch.spark.sql queries return null Elasticsearch es-hadoop	1	733	June 3, 2019
Empty strings not being allowed for insertion into ES from Spark Elasticsearch es-hadoop	1	1010	July 6, 2017

Elasticsearch-spark: empty fields are returned as "None" when using EsSpark.esRdd()

Related topics