I have a document type "foo" with an array of nested documents called "properties". I want to find any "foo"s that have duplicate entries in the "properties" array.
e.g. given two "foos":
"foo1": { "properties", [ { "display": "prop1"}, {"display": "prop2"} ]
and
"foo2": { "properties", [ { "display": "prop1"}, {"display": "prop1"} ]
I'm trying to construct a search that returns "foo2" but not "foo1" (foo2 has two properties with a display field of "prop1").
I know how to get counts of all values across all documents. But I can't figure out how to create a buckets array per top level document.
i.e. when I issue this query
GET /7/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"prop_counts": {
"nested" : {
"path": "properties"
},
"aggs": {
"inner_prop_counts": {
"terms": {
"field": "properties.display",
"size": 1000
}
}
}
}
}
}
I get results across all the top level documents:
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 764,
"max_score": 0,
"hits": []
},
"aggregations": {
"prop_counts": {
"doc_count": 35754,
"inner_prop_counts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "name",
"doc_count": 3788
},
{
"key": "type",
"doc_count": 3192
},
{
"key": "family",
"doc_count": 2228
},
{
"key": "offset",
"doc_count": 2202
},
....
How can I get a set of results that is limited to each top level document? And then how do I filter that result to just give back results where doc_count > 2?
Any help greatly appreciated.
Thanks,
John H