Transform vs copy_to, text concatenation used for aggregation

(Noel) #1

I have similar question as this poster:

I have 2 fields:

  1. doc_a.prop.proptexta
  2. doc_a.prop.proptextb

I want to identify if there are duplication using this 2 fields of doc_a. So, i tried

I create a mapping and in both proptexta and proptextb put copy_to proptextab, and using the new field proptextab
I assumed proptextab would be a concatenation of proptexta and proptextb, then, I did a aggregation on proptextab and the result buckets are separated:

say. proptexta = 123, proptextb= abc

in my aggregation result using proptextab.
I have

key: "123" 
key: "abc"

Which I was expecting

key: 123abc

So now.. I'm researching transform script in mapping...I am not sure it will work neither.

Continuing the discussion from Is it possible to use transform scripts in mappings to alter document _id?:

(Nik Everett) #2

copy_to makes a multi-valued field rather than one who's value is the concatenation of the two values. Its

  "key": ["123", "abc"]

rather than

  "key": "123abc"

Transform could be used to make what you are looking for but I advise against it because its deprecated/dangerously difficult to debug. Your best bet is to do the concatenation in your application on the way into elasticsearch. That way you can get the _source back to verify that it worked properly.

(Noel) #3

Thank you for the quick reply.

I also wonder if I can use script for the fields in aggregation.
I am able to do this with my two set of docs( doc_a and doc_b) in one index, works fine with one field proptexta:

    "MyAggResultComparingTwoDocNames": {
       "terms" : {
            "script": "doc[doc_a.prop.proptexta].values + doc[doc_b.prop.proptexta].values"

Is there a groovy script solution for the following?

(concatentate(doc_a.prop.proptexta,doc_a.prop.proptextb)).values + (concatentate(doc_b.prop.proptexta,doc_b.prop.proptextb)).values

(system) #4