Elasticsearch-spark: empty fields are returned as "None" when using EsSpark.esRdd()

(Arun) #1

I am using the latest build of elasticsearch-spark. (2.2.0.BUILD-20150818.024341-55) with Scala 2.11. When I process results from EsSpark.esRdd(), I find that empty columns (or fields) are returned as the string "None" instead of as the empty string. This seems like a bug. Or is there some setting to ensure that I get the empty string and not "None".


(Arun) #2

I found the solution. I needed to set "es.field.read.empty.as.null" to "no" (the default value is "yes").

  val esRdd: RDD[(String, Map[String, AnyRef])] =
    EsSpark.esRDD(sc, esIndexType, Map(
      "es.nodes" -> esHostPort,
      "es.field.read.empty.as.null" -> "no"))

(Costin Leau) #3

Glad to see you sorted things out.

(system) #4