Should doc['field_name'] remove duplicates?

gautesk · October 4, 2019, 11:23am

I am running elasticsearch 6.3 and try to perform a terms aggregation with a script as source. In the script, I use doc values to access a list that contains duplicates. It seems like all duplicates are removed when I use doc['my_field1'] to access the field. Is it supposed to be like that?

Snippet from my document:
{
"my_field1": ["a", "a", "b", "b"],
"my_field2": ["c", "d", "e", "f"]
}

Snippet of my code:

  "aggregations": {
    "terms_agg1": {
      "terms": {
        "script": {
          "source": "\n String returnString = '';\n for (int i = 0; i < doc['my_field1'].length; i++) { returnString += doc['my_field1'][i] + ';' + doc['my_field2'][i] + ')') } return returnString ",
          "lang": "painless"
        }
      }
    }
}

I expected that the entire content of my_field1 and my_field2 would be iterated in the script, but the former is read in as ["a", "b"] and hence my loop only iterates the lists two times.

As far as I can see, the doc_values documentation does not say anything about removing duplicates. I understand that the aggregation will eventually remove duplicates, but that should not apply to the source of the script, should it?

Any help would be greatly appreciated!

system · November 1, 2019, 11:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
duplicateNames with multiple fields? Elasticsearch	8	5173	March 21, 2018
Delete all docs that have duplicate field values Elasticsearch	5	365	March 10, 2022
Find duplicate docs by multi fields Elasticsearch	1	2285	March 18, 2018
How to remove duplicate values? Logstash	1	464	December 25, 2019
Elasticsearch aggregation script not maintaining document ID integrity Elasticsearch	3	430	July 6, 2017

Should doc['field_name'] remove duplicates?

Related topics