Spark RDD.saveToES

pferrel · July 29, 2015, 9:08pm

The Spark writing of an index works well if you construct the entire dataset with all fields before you write using rdd.saveToES. Is there a way to use this mechanism for upserting to change an existing field? I want to change the value of an existing field without changing the rest of the document.

If I write an rdd whose Map elements contain only one field won't the entire doc be deleted except for the Map element?

costin · August 13, 2015, 7:41pm

Depends on how you define the update operation; you can specify a script which can only change the value as oppose to deleting the whole document.
Along with mapping.include/exclude, the configuration settings give you access to all the update options in Elastic.

pferrel · August 14, 2015, 3:40pm

Thanks this is good to know but not sure these mappings help. First I don't know anything about the structure of the document at the time I am trying to do the equivalent of upserting a double value into the doc properties.

This seems like a very simple use case where I'm adding a possibly new property to a doc but rdd.saveToEs overwrites the entire doc with the Map in each rdd element.

The include/excluse docs seem to be talking about pruning unneeded data from a doc so maybe I misunderstand things.

To be clear one element of the rdd is something like a Scala tuple ("doc1", Map(("popularity" -> 1.0d)). I know doc1 has other fields but only want to write the "popularity" double field. If I use include mapping for "popularity won't this just erase the rest of the doc?

Should I include all with * and give it the Map above? Will that leave all fields alone and overwrite the "popularity" field?

costin · August 18, 2015, 9:39am

I think you misunderstand how update works in Elasticsearch. ES-Spark doesn't change its semantics rather exposes them in a way that's convenient in Spark.
Take a look at the Elasticsearch documentation - start for example with this section in the reference guide on partial updates which is what you are looking for.

Topic		Replies	Views
Updating an existing index using spark Elasticsearch es-hadoop	4	4540	July 6, 2017
Updating documents with excluded fields Elasticsearch es-hadoop	3	3356	July 6, 2017
How to update documents using spark Elasticsearch es-hadoop	2	1512	December 10, 2016
Update Document in Elasticsearch using spark 1.6 Elasticsearch es-hadoop	5	978	February 21, 2018
JavaEsSpark.saveToES not using pre-defined mapping fields while posting the data to ES cluster Elasticsearch es-hadoop	9	2373	April 9, 2017

Spark RDD.saveToES

Related topics