Updating/upserting using Cascading/Scalding with a pre-defined _id field

Hi. I've been creating Scalding Taps to read/write to elasticsearch (using
cascading EsTaps underneath), and everything has been going fine when
writing to documents using insert operations
(es.write.operation=ES_OPERATION_INDEX). I am writing to a preexisting
resource that I created through a REST call that defines the _id field:

curl -XPUT http://host:port/test_index/?pretty -d '
"mappings": {
"test_document" : {
"_id" : {
"path" : "some_id_field"

This allows my EsTap to automatically use the "some_id_field" (from the
cascading source) as the _id when it gets written to elasticsearch.
Unfortunately, when setting the es.write.operation property to
ES_OPERATION_UPDATE or ES_OPERATION_UPSERT, it no longer automatically uses
the some_id_field that I specified when I created the index. If I do not
specify the field with the "es_mapping_id" property, I get either a

Operation [%s] requires an id but none was given/found


Operation [%s] requires an id but none (%s) was specified

error, depending on whether I'm trying to update or upsert.

My question is: is it possible to not have to specify a es_mapping_id
property when updating/upserting (which would be less painful, since we
aren't manually specifying fields when we load the data, anyway), or can I
somehow retrieve the "path" value for _id and pass it along to the
cascading tap for any update/upsert operation?



You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a920e676-5de0-4cd5-b7b7-15b7d7f9b09f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.