Spark RDD.saveToES


(Pat Ferrel) #1

The Spark writing of an index works well if you construct the entire dataset with all fields before you write using rdd.saveToES. Is there a way to use this mechanism for upserting to change an existing field? I want to change the value of an existing field without changing the rest of the document.

If I write an rdd whose Map elements contain only one field won't the entire doc be deleted except for the Map element?


(Costin Leau) #2

Depends on how you define the update operation; you can specify a script which can only change the value as oppose to deleting the whole document.
Along with mapping.include/exclude, the configuration settings give you access to all the update options in Elastic.


(Pat Ferrel) #3

Thanks this is good to know but not sure these mappings help. First I don't know anything about the structure of the document at the time I am trying to do the equivalent of upserting a double value into the doc properties.

This seems like a very simple use case where I'm adding a possibly new property to a doc but rdd.saveToEs overwrites the entire doc with the Map in each rdd element.

The include/excluse docs seem to be talking about pruning unneeded data from a doc so maybe I misunderstand things.

To be clear one element of the rdd is something like a Scala tuple ("doc1", Map(("popularity" -> 1.0d)). I know doc1 has other fields but only want to write the "popularity" double field. If I use include mapping for "popularity won't this just erase the rest of the doc?

Should I include all with * and give it the Map above? Will that leave all fields alone and overwrite the "popularity" field?


(Costin Leau) #4

I think you misunderstand how update works in Elasticsearch. ES-Spark doesn't change its semantics rather exposes them in a way that's convenient in Spark.
Take a look at the Elasticsearch documentation - start for example with this section in the reference guide on partial updates which is what you are looking for.


(system) #5