On Nov 28, 2014 10:20 PM, "Reason" reason@bazaarvoice.com wrote:
The Elasticsearch documentation is always frustratingly silent on the
things I seem to need to accomplish to make life easier.
Sorry you feel that way. If you are willing to fix the documentation those
pull requests are typically merged quickly and don't need to be attached to
an issue.
Is it possible to use a transform script in a mapping to alter the
document _id? This would be a convenient way to de-dup incoming data I have
too little control over if so.
I wrote the transform script feature and I have no idea. It wasn't for that
but that doesn't mean it couldn't work.
After digging around online and not finding much of use, I naively tried
this in Elasticsearch 1.4, which alas did nothing:
"typename": {
"transform": {
"script": "ctx._source['useAsId'] = ctx._source['a'] +
ctx._source['b']",
"lang": "groovy"
},
"_id" {
"path": "useAsId"
},
"properties": {
"a": { "type": "string" },
"b": { "type": "string" },
"useAsId": { "type": "string" }
}
}
It seems that the ordering of operations isn't what I'd like it to be
under the hood; I don't get the _id I'd want out of this, but rather get
the standard auto-assigned _id value.
It looks like your proven that transform won't work for this. As I said
transform wasn't meant to do this so I implemented it by adding it in the
most convenient spot which iirc is right before fields are sent to lucene.
Changing the ID would require doing it before the document is routed which
means the changes the script makes would have to be serialized for routing
and I didn't want to deal with that.
Sorry!
I image you could implement a transform early pretty easily if you just
wanted to change the ID because that wouldn't need special serialization.
But then you'd have to wait for the pull request to be reviewed and for the
next minor release which is probably not what you want.
So is there a way to process an incoming document to alter the _id value
in this sort of way? Or there another more generally accepted route to
de-duping?
The route I use is id based. You have to construct the ID in the client
application though.
You can you van totally perform queries in the client before the update to
check for dupes but that too is a pain. You could make them perform pretty
well with routing I imagine though.
Nik
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0d2fbf40-619e-4af6-b991-4c4cfa4133c0%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3X6zN5AYPPr5-4_Q6Uxi2xwWqN0vTX7FKFQuFx1ATbTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.