duplicateNames with multiple fields?

Hello, I'm able to preform a query and find duplicate results with.

query = {
    "aggs": {
        "duplicateNames": {
            "terms": {
                "field": comparison_field,
                "size": 0,
                "min_doc_count": 2
            }
        }
    }
}

However is there a way I can do this for multiple fields? Ensuring that the document has all of the fields duplicated?

I'd probably compute a "hash" field based on the sum of the values of other fields at index time, then I'd try to aggregate on it.

1 Like

That's a fine solution. But given what that would entail for my system I'd prefer a work around.

Then you can use that probably: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-script

And emit a unique key based on the combination of the values you have in different fields.

I can predict that it will be dramatically slow but if speed is not an issue that might fit your needs.

1 Like

Speeds not an issue, will try this out, thanks!

I'm coming up to an issue using this. Albeit it may not be related to the query. I'm thrown a elasticsearch.exceptions.RequestError: <exception str() failed>
when I pass this query.

 {
       'aggs':{
          'duplicateNames':{
             'terms':{
                'script':"doc['@timestamp'].value"
             },
             'min_doc_count':2,
             'size':0
          }
       }
    }

Any idea on what this could be? I'm using Elastic Search 2.4, so the syntax is a bit different.

edit: I believe the issue is that scripting is not enabled by default.

No.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are exactly doing. Please, try to keep the example as simple as possible.

I will do so for further questions. After some investigations I figured out that the reason it wasn't working was because scripting wasn't enabled.

I can't change that so I'm opting to try and use a solution mentioned here. If I can't get that to work I will be using your has idea.

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.