Hi Mark,
First of all, thank you for your reply. I tried the composite aggregation
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"test_bucket": {
"composite": {
"sources": [
{ "category": { "terms": { "field": "category.keyword" } } },
{ "hash": { "terms": { "field": "hash" } } }
]
}
}
}
}
but I am getting this as a result,
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"test_bucket": {
"buckets": [
{
"key": {
"category": "the category 1",
"hash": "abcdefg"
},
"doc_count": 2
},
{
"key": {
"category": "the category 1",
"hash": "asdfjklñ"
},
"doc_count": 1
},
{
"key": {
"category": "the category 2",
"hash": "fghijk"
},
"doc_count": 1
}
]
}
}
}
As can be seen in "the category 1" I would expect to get doc_count of 1 instead of 2 as I am trying to de-duplicate items (both documents have the same hash in index 1 and index 2).
Am I doing something wrong here or is this the expected behavior and what I want to achieve is not possible?
Thanks in advance