Date_histogram buckets not as expected

So, I have a query for a date_histogram for which, if I set the interval to 1d, the response is:
"aggregations": {
"received": {
"buckets": [
{
"key_as_string": "2017-01-11T00:00:00.000Z",
"key": 1484092800000,
"doc_count": 2
},
{
"key_as_string": "2017-01-12T00:00:00.000Z",
"key": 1484179200000,
"doc_count": 1
},
{
"key_as_string": "2017-01-13T00:00:00.000Z",
"key": 1484265600000,
"doc_count": 2
}
]
}
}

The query for the above result is:

{
    "size": 0,
  "query": {
    "bool": {
      "must": [
        <!----Omitted sensitive data---->
        {
          "range": {
            "received": {
              "gte": "1483228800000",
              "lte": "1485907140000"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "received": {
      "date_histogram": {
        "field": "received",
        "interval": "1d"
      }
    }
  }
}

The aggregations are
Problem is when I change the "interval" to "7d" the result is:
"aggregations": {
"received": {
"buckets": [
{
"key_as_string": "2017-01-05T00:00:00.000Z",
"key": 1483574400000,
"doc_count": 2
},
{
"key_as_string": "2017-01-12T00:00:00.000Z",
"key": 1484179200000,
"doc_count": 3
}
]
}
}

I expected the result to be the following since the search from Jan 1 to Jan 31 2017 and I want it to bucket for 7 days. So it should bucket Jan 1-Jan 7, Jan 8-Jan 14, Jan 15-Jan 21, ...:
"aggregations": {
"received": {
"buckets": [
{
"key_as_string": "2017-01-08T00:00:00.000Z",
"key": 1512691200000,
"doc_count": 5
}
]
}
}

Can someone explain me why that isn't the case and what is wrong with my understanding?

Anyone has a clue as to why the buckets are working like this?

Hi,
if you use 7d intervals, they cannot sum up to a whole month and still return only buckets with alwasy 7d length. You'd end up with a bucket at the end of the month with less days in it. The way intervals work, they don't take into accout month/year boundaries.

Hi Christoph,

Thanks for the reply. My problem isn't with the last bucket, but with the first one. I expected the first bucket to be Jan 1-Jan 7 and second bucket to be Jan 8-Jan 15, since the search date for my result is Jan 1-Jan30.

Data exists for following dates:
Jan 11: 2 records
Jan 12: 1 record
Jan 13: 2 records

Hence, I expected the result to be:
Jan 8: doc_count: 5

But, it does:
Jan 5: doc_count: 2
Jan 12: doc_count: 3

It almost seems like it looks for the first record and then subtracts 7 out of it to create the first bucket and after that it just moves with 7 day increments.

That might be coincidence. The fact remains that intervals are agnostic to month/year start etc... Have you tried weeks ('1w') instead? That might take into account calendars.

I will try that out and also do some testing as to what happens if I add data prior to Jan 11. I will update the thread with my findings. Thanks for insight Christoph.

I tried 1w, you will get buckets starting in the respective calendar weeks on monday:

  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 1,
        "_source": {
          "date": "2017-03-06"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 1,
        "_source": {
          "date": "2017-01-05"
        }
      }
    ]
  },
"aggregations": {
    "received": {
      "buckets": [
        {
          "key_as_string": "2017-01-02T00:00:00.000Z",
          "key": 1483315200000,
          "doc_count": 1
        },
        {
          "key_as_string": "2017-01-09T00:00:00.000Z",
          "key": 1483920000000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-01-16T00:00:00.000Z",
          "key": 1484524800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-01-23T00:00:00.000Z",
          "key": 1485129600000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-01-30T00:00:00.000Z",
          "key": 1485734400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-02-06T00:00:00.000Z",
          "key": 1486339200000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-02-13T00:00:00.000Z",
          "key": 1486944000000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-02-20T00:00:00.000Z",
          "key": 1487548800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-02-27T00:00:00.000Z",
          "key": 1488153600000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-03-06T00:00:00.000Z",
          "key": 1488758400000,
          "doc_count": 1
        }
      ]
    }
  }

My requirement is to bucket it in weeks (7 days) starting from the beginning date of my search. So if the user searches Jan 1-Jan 30, it should bucket jan 1-Jan7, Jan 8-Jan 15,...Any idea how we can do that?

Did some more testing and seems like if I use "7d" interval it is bucketing it from Thursday to following Wednesday. Not sure if that's configurable or what's causing it. For now I guess "1w" is a better interval to work with.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.