The Spark writing of an index works well if you construct the entire dataset with all fields before you write using rdd.saveToES. Is there a way to use this mechanism for upserting to change an existing field? I want to change the value of an existing field without changing the rest of the document.
If I write an rdd whose Map elements contain only one field won't the entire doc be deleted except for the Map element?
Depends on how you define the update operation; you can specify a script which can only change the value as oppose to deleting the whole document.
exclude, the configuration settings give you access to all the update options in Elastic.
Thanks this is good to know but not sure these mappings help. First I don't know anything about the structure of the document at the time I am trying to do the equivalent of upserting a double value into the doc properties.
This seems like a very simple use case where I'm adding a possibly new property to a doc but rdd.saveToEs overwrites the entire doc with the Map in each rdd element.
The include/excluse docs seem to be talking about pruning unneeded data from a doc so maybe I misunderstand things.
To be clear one element of the rdd is something like a Scala tuple
("doc1", Map(("popularity" -> 1.0d)). I know doc1 has other fields but only want to write the "popularity" double field. If I use
include mapping for "popularity won't this just erase the rest of the doc?
include all with * and give it the Map above? Will that leave all fields alone and overwrite the "popularity" field?
I think you misunderstand how
update works in Elasticsearch. ES-Spark doesn't change its semantics rather exposes them in a way that's convenient in Spark.
Take a look at the Elasticsearch documentation - start for example with this section in the reference guide on partial updates which is what you are looking for.