duplicateNames with multiple fields?


(Liam Pieri) #1

Hello, I'm able to preform a query and find duplicate results with.

query = {
    "aggs": {
        "duplicateNames": {
            "terms": {
                "field": comparison_field,
                "size": 0,
                "min_doc_count": 2
            }
        }
    }
}

However is there a way I can do this for multiple fields? Ensuring that the document has all of the fields duplicated?


(David Pilato) #2

I'd probably compute a "hash" field based on the sum of the values of other fields at index time, then I'd try to aggregate on it.


(Liam Pieri) #3

That's a fine solution. But given what that would entail for my system I'd prefer a work around.


(David Pilato) #4

Then you can use that probably: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-script

And emit a unique key based on the combination of the values you have in different fields.

I can predict that it will be dramatically slow but if speed is not an issue that might fit your needs.


(Liam Pieri) #5

Speeds not an issue, will try this out, thanks!


(Liam Pieri) #7

I'm coming up to an issue using this. Albeit it may not be related to the query. I'm thrown a elasticsearch.exceptions.RequestError: <exception str() failed>
when I pass this query.

 {
       'aggs':{
          'duplicateNames':{
             'terms':{
                'script':"doc['@timestamp'].value"
             },
             'min_doc_count':2,
             'size':0
          }
       }
    }

Any idea on what this could be? I'm using Elastic Search 2.4, so the syntax is a bit different.

edit: I believe the issue is that scripting is not enabled by default.


(David Pilato) #8

No.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are exactly doing. Please, try to keep the example as simple as possible.


(Liam Pieri) #9

I will do so for further questions. After some investigations I figured out that the reason it wasn't working was because scripting wasn't enabled.

I can't change that so I'm opting to try and use a solution mentioned here. If I can't get that to work I will be using your has idea.

Thank you.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.