Hello,
I have a dataset with 3 000 000 indices from a shipping platform. Each index describes one product, and I would like to put similar products in one bucket. To be considered “similar” five fields have to be the same (In the example only two fields are aggregated). I tried to find a solution for this with the composite aggregation as followed:
GET shop/_search
{
"size": 0,
"aggs": {
"duplicates": {
"composite": {
"sources": [
{ "seller": { "terms": { "field": "SellerId.keyword" } } },
{ "description": { "terms": { "field": "description.keyword" } } }
]
},
"aggs": {
"filter": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count > 1"
}
}
}
}
}
}
the answer is:
{
"took" : 1855,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"duplicates" : {
"after_key" : {
"seller" : "10000407",
"description" : "car"
},
"buckets" : [
{
"key" : {
"seller" : "10000068",
"description" : "phone"
},
"doc_count" : 2
}
]
}
}
}
I don't understand the answer and why the 'after_key' is automatically added. I definitely have more indices where the sellerId and the description is the same.
And another question: Is there a way to count all buckets?
I would be very thankful for every help!
I wish you the best, Hella