Aggregation numbers do not add up

Hello friends!
I'm trying to learn aggregations on Elastic for my company's backend.
Our documents include a label field (which could be many different keywords), and I would like to aggregate and calculate certain things in small time steps.
So, for exmaple:

GET /inferences/_search
{
  "size": 0,
  "aggs": {
    "histogram": {
      "date_histogram": {
        "field": "created_at",
        "fixed_interval": "5m"
      },
      "aggs": {
        "label_buckets": {
          "terms": {
            "field": "model_predictions.predictions.label.classification.value.keyword",
            "min_doc_count": 0
          }
        },
        "confidence": {
          "avg": {
            "field": "confidence"
          }
        }
      }
    }
  }
}

Currently in my demo environment I have docs with the labels "defect" and "yes". I would like the buckets to show all found labels, even if there are 0 results. What I actually get is:

  "aggregations" : {
    "histogram" : {
      "buckets" : [
        {
          "key_as_string" : "1970-01-19T23:45:00.000Z",
          "key" : 1640700000,
          "doc_count" : 2,
          "confidence" : {
            "value" : 0.2989208698272705
          },
          "label_buckets" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "defect",
                "doc_count" : 0
              }
            ]
          }
        },
        {
          "key_as_string" : "1970-01-19T23:50:00.000Z",
          "key" : 1641000000,
          "doc_count" : 8,
          "confidence" : {
            "value" : 0.6964685171842575
          },
          "label_buckets" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "defect",
                "doc_count" : 0
              }
            ]
          }
        },
        {
          "key_as_string" : "1970-01-19T23:55:00.000Z",
          "key" : 1641300000,
          "doc_count" : 6,
          "confidence" : {
            "value" : 1.0
          },
          "label_buckets" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "defect",
                "doc_count" : 0
              }
            ]
          }
        },
        {
          "key_as_string" : "1970-01-20T00:00:00.000Z",
          "key" : 1641600000,
          "doc_count" : 3,
          "confidence" : {
            "value" : 0.8757350246111552
          },
          "label_buckets" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ ]
          }
        }
      ]
    }
  }

Two things stand out to me which I can't figure out:

  1. Why do the labels not appear as buckets everywhere? The "defect" bucket only appears in 3/4 time buckets, and there are no "yes" buckets.
  2. Why do the numbers not add up? Every time bucket has a positive doc_count but the actual label buckets only show 0.

Interestingly, the confidence aggregation works fine (I think, I haven't done the math myself yet, but given that its results are positive it must be calculating over something).

I'd love to know what I'm doing wrong here! Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.