Sorting documents with same multiple fields in one bucket

Hello,
I have a dataset with 3 000 000 indices from a shipping platform. Each index describes one product, and I would like to put similar products in one bucket. To be considered “similar” five fields have to be the same (In the example only two fields are aggregated). I tried to find a solution for this with the composite aggregation as followed:

GET shop/_search
{
  "size": 0,
  "aggs": {
    "duplicates": {
      "composite": {
        "sources": [
          { "seller": { "terms": { "field": "SellerId.keyword" } } },
          { "description": { "terms": { "field": "description.keyword" } } }
        ]
      },
      "aggs": {
          "filter": {
            "bucket_selector": {
              "buckets_path": {
                "doc_count": "_count"
              },
              "script": "params.doc_count > 1"
            }
         }
      }
    }
  }
}

the answer is:

{
  "took" : 1855,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "duplicates" : {
      "after_key" : {
        "seller" : "10000407",
        "description" : "car"
      },
      "buckets" : [
        {
          "key" : {
            "seller" : "10000068",
            "description" : "phone"
          },
          "doc_count" : 2
        }
      ]
    }
  }
}

I don't understand the answer and why the 'after_key' is automatically added. I definitely have more indices where the sellerId and the description is the same.
And another question: Is there a way to count all buckets?
I would be very thankful for every help!
I wish you the best, Hella :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.