Date histogram aggregation seems incorrect with calendar_interval and when offset >= 30 days

Based on this doc, elasticsearch supports Calendar-aware interval on data histogram aggregation. Specially for a "quarter" interval:

"One quarter is the interval between the start day of the month and time of day and the same day of the month and time of day three months later, so that the day of the month and time of day are the same at the start and end."

My understanding of this is every bucket in the response should start at the same day, but that's not the case when I use offset >= 30 days. Some buckets start at, say 5th of the month, while one other bucket starts at the 6th of the month. Is this a bug or am I missing anything?

Elasticsearch Version: 7.17
Steps to reproduce:

PUT /test01 
{
  "mappings": {
    "properties": {
      "date": { "type": "date" }
    }
  }
}

POST /test01/_doc { "date": 1642658400000 } // 01/20/2022, 6AM UTC
POST /test01/_doc { "date": 1645336800000 } // 02/20/2022, 6AM UTC
...
POST /test01/_doc { "date": 1660975200000 } // 08/20/2022, 6AM UTC
// basically one document every month from January to August

Queries:

{
    "size": 0,
    "aggs": {
        "ShiftedQuarter": {
            "date_histogram": {
                "calendar_interval": "quarter",
                "min_doc_count": 1,
                "time_zone": "UTC",
                "offset": "+20d", // offset is less than 30 days
                "field": "date"
            }
        }
    }
}

returns below as expected (every bucket starts at the 21st day of the month):

"aggregations": {
    "ShiftedQuarter": {
      "buckets": [
        {
          "key_as_string": "2021-10-21T00:00:00.000Z",
          "key": 1634774400000,
          "doc_count": 1
        },
        {
          "key_as_string": "2022-01-21T00:00:00.000Z",
          "key": 1642723200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-04-21T00:00:00.000Z",
          "key": 1650499200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-07-21T00:00:00.000Z",
          "key": 1658361600000,
          "doc_count": 1
        }
      ]
    }
  }

However if the query use a offset that is large than 30, like

{
    "size": 0,
    "aggs": {
        "ShiftedQuarter": {
            "date_histogram": {
                "calendar_interval": "quarter",
                "min_doc_count": 1,
                "time_zone": "UTC",
                "offset": "+35d",
                "field": "date"
            }
        }
    }
}

then some bucket starts at the 5th but some starts at the 6th:

"aggregations": {
    "ShiftedQuarter": {
      "buckets": [
        {
          "key_as_string": "2021-11-05T00:00:00.000Z",
          "key": 1636070400000,
          "doc_count": 1
        },
        {
          "key_as_string": "2022-02-05T00:00:00.000Z",
          "key": 1644019200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-05-06T00:00:00.000Z", // 06 as opposed to 05 for other buckets
          "key": 1651795200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-08-05T00:00:00.000Z",
          "key": 1659657600000,
          "doc_count": 1
        }
      ]
    }
  }

Your quote is IMO correct if you work with quarters. But your offset is in days.

The quarters start on the 2021-10-01, 2022-01-01, 2022-04-01,... Then you add +35d and depending if the first month has 31 days (like October or January) or 30 days (April), you'll end up on a different day.

Not sure this is a feature or a bug.

Thanks @xeraa for your response! So our use case is: we have customers whose fiscal quarter could start at, say, the 5th of the second month of a standard calendar quarter, which are 2021-08-05, 2021-11-05, 2022-02-05, 2022-05-05, that's why I need to use more than 30 days in the offset and expecting the start date is the same for each bucket. So IMO this is a bug. Can I submit an issue in gitlab?

I feel like this is more of a new feature: Quarterly interval from date X. IMO you want to operate on quarters as the offset and not +35d but I also have no idea about the implementation details.

But GitHub (not GitLab) should be a good next step.

In the meantime, while more manual work with all the intervals, the date range aggregation might get you the right result?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.