Date histogram aggregation with extended bounds returning too many buckets


(Aaron Cunnington) #1

I want my query to do this:

"Show me the last 28 days of some data, grouped in 7 day intervals"

I would expect to get 4 buckets, with each bucket representing one week of data for a total of 28 days. But instead, I am getting 5 buckets equivalent to 35 days.

My query looks like this:

{
    "size": 0,
    "aggs": {
        "metrics:histogram": {
            "date_histogram": {
                "field": "some.timestamp",
                "interval": "7d",
                "min_doc_count": 0,
                "extendedBounds": {
                    "min": 1490278007355,
                    "max": 1492693607355
                }
            },
            "aggs": {}
        }
    },
    "query": {
        "filtered": {
            "filter": {
                "and": [
                    {
                        "range": {
                            "some.timestamp": {
                                "gt": 1490278007355,
                                "lte": 1492693607355
                            }
                        }
                    },
                    {
                        "terms": {
                            "some.array": [
                                123,
                                456,
                                789
                            ]
                        }
                    }
                ]
            }
        }
    }
}

My response looks like this:

{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 72198,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "metrics:histogram": {
      "buckets": [
        {
          "key_as_string": "1490227200000",
          "key": 1490227200000,
          "doc_count": 0
        },
        {
          "key_as_string": "1490832000000",
          "key": 1490832000000,
          "doc_count": 0
        },
        {
          "key_as_string": "1491436800000",
          "key": 1491436800000,
          "doc_count": 273
        },
        {
          "key_as_string": "1492041600000",
          "key": 1492041600000,
          "doc_count": 64681
        },
        {
          "key_as_string": "1492646400000",
          "key": 1492646400000,
          "doc_count": 7244
        }
      ]
    }
  }
}

As you can see, the first bucket is one week before the query filter and histogram extended bounds configuration.

Why is this? And how can I prevent it?

Many thanks.


(Adrien Grand) #2

This looks like the expected behaviour to me. The lowest date that would match the range is 1490278007356 (March 23rd) and the highest date value that could match the range is 1492693607355 (April 20th), so you need buckets for March 23rd, March 30th, April 6th, April 13th and April 20th.

This happens because the lower and upper bounds of your range query are not rounded, so in spite of the fact that your range matches 28 days, it could match 5 7-days intervals. I believe that you would like to do something like this (not thouroughly tested):

GET _search
{
  "size": 0,
  "aggs": {
    "metrics:histogram": {
      "date_histogram": {
        "field": "some.timestamp",
        "interval": "7d",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "1490278007355",
          "max": "1492693607355||-7d/d"
        }
      },
      "aggs": {}
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "some.timestamp": {
              "gt": "1490278007355",
              "lte": "1492693607355"
            }
          }
        },
        {
          "terms": {
            "some.array": [
              123,
              456,
              789
            ]
          }
        }
      ]
    }
  }
}

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.