Transform vs copy_to, text concatenation used for aggregation

noelyim · October 7, 2015, 2:45pm

I have similar question as this poster:

I have 2 fields:

doc_a.prop.proptexta
doc_a.prop.proptextb

I want to identify if there are duplication using this 2 fields of doc_a. So, i tried

copyto:
I create a mapping and in both proptexta and proptextb put copy_to proptextab, and using the new field proptextab
I assumed proptextab would be a concatenation of proptexta and proptextb, then, I did a aggregation on proptextab and the result buckets are separated:

say. proptexta = 123, proptextb= abc

in my aggregation result using proptextab.
I have

buckets: 
key: "123" 
doc_count:...
key: "abc"
doc_count:....

Which I was expecting

buckets:
key: 123abc
doc_count:...

So now.. I'm researching transform script in mapping...I am not sure it will work neither.

Continuing the discussion from Is it possible to use transform scripts in mappings to alter document _id?:

Is it possible to use transform scripts in mappings to alter document _id?

The Elasticsearch documentation is always frustratingly silent on the
things I seem to need to accomplish to make life easier.

Is it possible to use a transform script in a mapping to alter the document
_id? This would be a convenient way to de-dup incoming data I have too
little control over if so.

After digging around online and not finding much of use, I naively tried
this in Elasticsearch 1.4, which alas did nothing:

"typename": {
"transform": {
"script": "ctx._source['useAsId'] = ctx._source['a'] + ctx._source['b']"
,
"lang": "groovy"
},
"_id" {
"path": "useAsId"
},
"properties": {
"a": { "type": "string" },
"b": { "type": "string" },
"useAsId": { "type": "string" }
}
}

It seems that the ordering of operations isn't what I'd like it to be under
the hood; I don't get the _id I'd want out of this, but rather get the
standard auto-assigned _id value.

So is there a way to process an incoming document to alter the _id value in
this sort of way? Or there another more generally accepted route to
de-duping?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0d2fbf40-619e-4af6-b991-4c4cfa4133c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · October 7, 2015, 3:05pm

copy_to makes a multi-valued field rather than one who's value is the concatenation of the two values. Its

{
  "key": ["123", "abc"]
}

rather than

{
  "key": "123abc"
}

Transform could be used to make what you are looking for but I advise against it because its deprecated/dangerously difficult to debug. Your best bet is to do the concatenation in your application on the way into elasticsearch. That way you can get the _source back to verify that it worked properly.

noelyim · October 7, 2015, 3:16pm

Thank you for the quick reply.

I also wonder if I can use script for the fields in aggregation.
I am able to do this with my two set of docs( doc_a and doc_b) in one index, works fine with one field proptexta:

"aggs"
    "MyAggResultComparingTwoDocNames": {
       "terms" : {
            "script": "doc[doc_a.prop.proptexta].values + doc[doc_b.prop.proptexta].values"

Is there a groovy script solution for the following?

(concatentate(doc_a.prop.proptexta,doc_a.prop.proptextb)).values + (concatentate(doc_b.prop.proptexta,doc_b.prop.proptextb)).values

Topic		Replies	Views
Concatenating array objects in elasticsearch transform aggregations Elasticsearch	3	2653	December 23, 2021
Is it possible to use transform scripts in mappings to alter document _id? Elasticsearch	2	700	July 6, 2017
If copy_to is used to combine multiple string fields, should it concatenate them together? Elasticsearch	1	991	July 6, 2017
Mapping - transform: only for creating new and not for updating? Elasticsearch	10	3208	July 5, 2017
Copy_to for the existing data not new coming one Elasticsearch	3	570	July 5, 2017

Transform vs copy_to, text concatenation used for aggregation

Related topics