I have a document type "foo" with an array of nested documents called "properties". I want to find any "foo"s that have duplicate entries in the "properties" array.
e.g. given two "foos":
"foo1": { "properties", [ { "display": "prop1"}, {"display": "prop2"} ]
and
"foo2": { "properties", [ { "display": "prop1"}, {"display": "prop1"} ]
I'm trying to construct a search that returns "foo2" but not "foo1" (foo2 has two properties with a display field of "prop1").
I know how to get counts of all values across all documents. But I can't figure out how to create a buckets array per top level document.
i.e. when I issue this query
GET /7/_search
{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "prop_counts": {
            "nested" : {
                "path": "properties"            
            },
            "aggs": {
                "inner_prop_counts": {
                    "terms": {
                        "field": "properties.display",
                    "size": 1000
                    
                    }
                }
            }            
        }    
    }
}
I get results across all the top level documents:
{
   "took": 24,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 764,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "prop_counts": {
         "doc_count": 35754,
         "inner_prop_counts": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "name",
                  "doc_count": 3788
               },
               {
                  "key": "type",
                  "doc_count": 3192
               },
               {
                  "key": "family",
                  "doc_count": 2228
               },
               {
                  "key": "offset",
                  "doc_count": 2202
               },
              ....
How can I get a set of results that is limited to each top level document? And then how do I filter that result to just give back results where doc_count > 2?
Any help greatly appreciated.
Thanks,
John H