Empty buckets return different type of result value '0' and 'null' for the same field


(Guanghaofan) #1

Kibana: 6.2.5

Metric Type: Sum, the response return value '0' for the empty buckets.
Metric Type: Min/Max/Average, the response return value 'null' for the empty buckets.
Below the request and response:

Request:
{
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"terms": {
"field": "x_coord",
"size": 50,
"order": {
"_term": "desc"
}
},
"aggs": {
"3": {
"histogram": {
"field": "y_coord",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 1,
"max": 22
}
},
"aggs": {
"1": {
"sum": {
"field": "Fcnt"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [],
"should": [],
"must_not": []
}
}
}
And the response like:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 246,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "3": {
            "buckets": [
              {
                "1": {
                  **"value": 0**
                },
                "key": 1,
                "doc_count": 0
              },
              {
                "1": {
                  **"value": 0**
                },
                "key": 2,
                "doc_count": 0
              },
              {
                "1": {
                  **"value": 0**
                },
               ...
               ...

But while the Metric Type to "Min"
Request:
{
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"terms": {
"field": "x_coord",
"size": 50,
"order": {
"_term": "desc"
}
},
"aggs": {
"3": {
"histogram": {
"field": "y_coord",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 1,
"max": 22
}
},
"aggs": {
"1": {
"min": {
"field": "Fcnt"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [],
"should": [],
"must_not": []
}
}
}
And the response:
> {

      "took": 8,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 246,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "2": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "3": {
                "buckets": [
                  {
                    "1": {
                      **"value": null**
                    },
                    "key": 1,
                    "doc_count": 0
                  },
                  {
                    "1": {
                      **"value": null**
                    },
                    "key": 2,
                    "doc_count": 0
                  },
    ....
    ....

this two kinds of empty results:

value '0' mixed with the none empty bucket value '0', right? now while using the histogram, the empty bucket data just mix with the valid value '0' with 'sum' type metric.
but if value 'null', we know that this is from the empty bucket and would not mix with the none empty bucket data.


(Thomas Neirynck) #2

hi @guanghaofan,

this is interesting. I assume because summing is a form of counting, it uses a default of 0, which remains semantically correct, whereas min/max/average are meaningless if there is no sample data to compute these values from.

I would bring this up as an issue in the Elasticsearch project https://github.com/elastic/elasticsearch/issues/new. The mismatch in behavior seems like an inconsistency at first, but maybe there is a valid reason. A maintainer will be able to confirm that for you or not.


(Guanghaofan) #3

yes, that's interesting, and not very clear about the reason behind this design. in my opinion I think only the metric type Count should return Value '0' for the empty bucket and all the others 'Sum/Max/Min/Avg/Mean...' should return value 'null'.
And actually the user would be very confused for the visulations which enable the empty bucket if there're both metric type Sum/Min in their aggs.
and pelase follow up this since it really results in some unexpected behaviors and all the valid metric data result '0' are mixed with the empty no-valid resut '0' if using 'Sum' in their aggs in the kibana visuliuze. now kibana only can hide the empty buckets only with value 'null'. and maybe someone can help talk this to the kibana developers and then we can get a good understanding to it.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.