Hello friends!
I'm trying to learn aggregations on Elastic for my company's backend.
Our documents include a label field (which could be many different keywords), and I would like to aggregate and calculate certain things in small time steps.
So, for exmaple:
GET /inferences/_search
{
"size": 0,
"aggs": {
"histogram": {
"date_histogram": {
"field": "created_at",
"fixed_interval": "5m"
},
"aggs": {
"label_buckets": {
"terms": {
"field": "model_predictions.predictions.label.classification.value.keyword",
"min_doc_count": 0
}
},
"confidence": {
"avg": {
"field": "confidence"
}
}
}
}
}
}
Currently in my demo environment I have docs with the labels "defect" and "yes". I would like the buckets to show all found labels, even if there are 0 results. What I actually get is:
"aggregations" : {
"histogram" : {
"buckets" : [
{
"key_as_string" : "1970-01-19T23:45:00.000Z",
"key" : 1640700000,
"doc_count" : 2,
"confidence" : {
"value" : 0.2989208698272705
},
"label_buckets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "defect",
"doc_count" : 0
}
]
}
},
{
"key_as_string" : "1970-01-19T23:50:00.000Z",
"key" : 1641000000,
"doc_count" : 8,
"confidence" : {
"value" : 0.6964685171842575
},
"label_buckets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "defect",
"doc_count" : 0
}
]
}
},
{
"key_as_string" : "1970-01-19T23:55:00.000Z",
"key" : 1641300000,
"doc_count" : 6,
"confidence" : {
"value" : 1.0
},
"label_buckets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "defect",
"doc_count" : 0
}
]
}
},
{
"key_as_string" : "1970-01-20T00:00:00.000Z",
"key" : 1641600000,
"doc_count" : 3,
"confidence" : {
"value" : 0.8757350246111552
},
"label_buckets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
Two things stand out to me which I can't figure out:
- Why do the labels not appear as buckets everywhere? The "defect" bucket only appears in 3/4 time buckets, and there are no "yes" buckets.
- Why do the numbers not add up? Every time bucket has a positive
doc_count
but the actual label buckets only show 0.
Interestingly, the confidence aggregation works fine (I think, I haven't done the math myself yet, but given that its results are positive it must be calculating over something).
I'd love to know what I'm doing wrong here! Thanks!