Date histogram aggregation with extended bounds returning too many buckets

azcn2503 · April 20, 2017, 1:50pm

I want my query to do this:

"Show me the last 28 days of some data, grouped in 7 day intervals"

I would expect to get 4 buckets, with each bucket representing one week of data for a total of 28 days. But instead, I am getting 5 buckets equivalent to 35 days.

My query looks like this:

{
    "size": 0,
    "aggs": {
        "metrics:histogram": {
            "date_histogram": {
                "field": "some.timestamp",
                "interval": "7d",
                "min_doc_count": 0,
                "extendedBounds": {
                    "min": 1490278007355,
                    "max": 1492693607355
                }
            },
            "aggs": {}
        }
    },
    "query": {
        "filtered": {
            "filter": {
                "and": [
                    {
                        "range": {
                            "some.timestamp": {
                                "gt": 1490278007355,
                                "lte": 1492693607355
                            }
                        }
                    },
                    {
                        "terms": {
                            "some.array": [
                                123,
                                456,
                                789
                            ]
                        }
                    }
                ]
            }
        }
    }
}

My response looks like this:

{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 72198,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "metrics:histogram": {
      "buckets": [
        {
          "key_as_string": "1490227200000",
          "key": 1490227200000,
          "doc_count": 0
        },
        {
          "key_as_string": "1490832000000",
          "key": 1490832000000,
          "doc_count": 0
        },
        {
          "key_as_string": "1491436800000",
          "key": 1491436800000,
          "doc_count": 273
        },
        {
          "key_as_string": "1492041600000",
          "key": 1492041600000,
          "doc_count": 64681
        },
        {
          "key_as_string": "1492646400000",
          "key": 1492646400000,
          "doc_count": 7244
        }
      ]
    }
  }
}

As you can see, the first bucket is one week before the query filter and histogram extended bounds configuration.

Why is this? And how can I prevent it?

Many thanks.

jpountz · April 20, 2017, 3:27pm

This looks like the expected behaviour to me. The lowest date that would match the range is 1490278007356 (March 23rd) and the highest date value that could match the range is 1492693607355 (April 20th), so you need buckets for March 23rd, March 30th, April 6th, April 13th and April 20th.

This happens because the lower and upper bounds of your range query are not rounded, so in spite of the fact that your range matches 28 days, it could match 5 7-days intervals. I believe that you would like to do something like this (not thouroughly tested):

GET _search
{
  "size": 0,
  "aggs": {
    "metrics:histogram": {
      "date_histogram": {
        "field": "some.timestamp",
        "interval": "7d",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "1490278007355",
          "max": "1492693607355||-7d/d"
        }
      },
      "aggs": {}
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "some.timestamp": {
              "gt": "1490278007355",
              "lte": "1492693607355"
            }
          }
        },
        {
          "terms": {
            "some.array": [
              123,
              456,
              789
            ]
          }
        }
      ]
    }
  }
}

system · May 18, 2017, 3:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Date_histogram returning wrong number of buckets Elasticsearch	2	361	March 18, 2020
Date_histogram buckets not as expected Elasticsearch	10	910	March 30, 2017
Date histogram doesn't create buckets for dates later than the last record found, even with 'extended_bounds' min value of 0. How do I fix this? Elasticsearch	4	460	November 20, 2019
Unexpected behaviour of date histogram aggregation Elasticsearch	4	1104	June 14, 2017
Extended_bounds causes extra bucket Elasticsearch	3	419	August 12, 2019

Date histogram aggregation with extended bounds returning too many buckets

Related topics