Date histogram aggregation over duration from startDate to endDate

truptir · February 22, 2022, 7:16pm

Hi, I m trying to create an elastic query that should return all fields in each bucket using date histogram builder.

I have an index that has thresholdValue with a start date and end date of 2/3 months period. Now I want to retrieve thresholdValue in each month bucket.

elastic index doc:

{
"type":"GEN"
"startDate":"2022-01-01",
"endDate":"2022-02-28",
"thresholdValue":"1"
},
{
"type":"GEN"
"startDate":"2022-03-01",
"endDate":"2022-05-31",
"thresholdValue":"1.5"
}

Query:

POST /rating/_search
{
  "aggs": {
    "thresholdValue": {
      "date_histogram": {
        "field": "endDate",
        "format": "yyyy-MM-dd",
        "time_zone": "+05:30",
        "calendar_interval": "1M",
        "offset": 0,
        "offset": 0,
        "order": {
          "_key": "asc"
        },
        "keyed": false,
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01",
          "max": "2022-03-31"
        }
      },
      "aggregations": {
        "thresholdValue": {
          "sum": {
            "field": "thresholdValue"
          }
        }
      }
    }
  }
}

Result I m getitng is:

"aggregations" : {
    "thresholdValue" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01",
          "key" : 1640975400000,
          "doc_count" : 5,
          "thresholdValue" : {
            "value" : 1.0
          }
        },
        {
          "key_as_string" : "2022-02-01",
          "key" : 1643653800000,
          "doc_count" : 0,
          "thresholdValue" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-03-01",
          "key" : 1646073000000,
          "doc_count" : 0,
          "thresholdValue" : {
            "value" : 1.5
          }
        }
      ]
    }
  }
}

My expecting result is 2nd bucket should also have thresholdValue 1.0 and the 4th bucket should have a value 1.5 as it matches under start and end date. but it is considering value in the 1st bucket only.

Thanks:)

Tomo_M · February 22, 2022, 7:30pm

Date histogram buckets include start but not include end.
As it is histogram, each documents are divided into some single buckets. They shoud not be duplicated.

truptir · February 23, 2022, 4:58am

Hi @Tomo_M, Thanks for your quick reply. Is there any alternative to achieve this.

casterQ · February 23, 2022, 10:18am

you can use Date range define your own date range

casterQ · February 23, 2022, 10:21am

such as :

POST test/_search?size=0
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",
        "ranges": [
          {
            "from": "2015-07-30T23:33:09",
            "to": "2015-08-30T23:33:09"
          },
          {
            "from": "2015-07-23T23:33:09",
            "to": "2015-08-31T23:33:09"
          },
          {
            "from": "2015-08-02T23:33:09",
            "to": "2015-09-30T23:33:09"
          }
        ]
      }
    }
  }
}

Tomo_M · February 23, 2022, 10:27am

Date range aggregation
A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in Date Math expressions, and it is also possible to specify a date format by which the from and to response fields will be returned. Note that this aggregation includes the from value and excludes the to value for each range.
Date range aggregation | Elasticsearch Guide [8.11] | Elastic

The document says Date range aggregation also excludes to value. It is the same as Range aggregation.

I suppose using Filters aggregation with Range query is better. In range query, you can use gte and lte to include the limit values.

Tomo_M · February 23, 2022, 10:29am

Or it is possible using something like {to: 2015-08-30T00:00:00.001} with Date range aggregation.

truptir · February 23, 2022, 1:46pm

I think I made the question complicated. My requirement is not only to include the end date.

I have an index doc that has a threshold data time period-wise.

{
StartDate: 2022-01-01
endDate: 2022-03-31
Threshold: 1.5
}
{
StartDate: 2022-04-01
endDate: 2022-12-31
Threshold: 2.7
}

Now I want to retrieve the threshold in monthly or weekly buckets.

{
bucket:[
2022-01-01:{
Threshold: 1.5
},
2022-02-01:{
Threshold: 1.5
},
2022-03-01:{
Threshold: 1.5
},
2022-04-01:{
Threshold: 2.7
},
2022-05-01:{
Threshold: 2.7
}
]
}

Tomo_M · February 23, 2022, 2:21pm

I got what you want, but it is a bit far from date histogram aggregation on date fields.
It is not only aggregation but it need flattening or some reconstructing the data.

For such case, Range field type is a possible option.

PUT /test_date_range/
{
  "mappings": {
    "properties": {
      "dateRange":{
        "type":"date_range"
      },
      "val": {"type":"float"}
    }
  }
}

POST test_date_range/_bulk
{"index":{}}
{"val":1.5,"dateRange":{"gte":"2022-01-01","lte":"2022-03-31"}}
{"index":{}}
{"val":2.7,"dateRange":{"gte":"2022-04-01","lte":"2022-12-31"}}
{"index":{}}
{"val":2.0,"dateRange":{"gte":"2022-01-01","lte":"2022-02-15"}}
{"index":{}}
{"val":2.5,"dateRange":{"gte":"2022-02-15","lte":"2022-12-31"}}

GET test_date_range/_search

GET test_date_range/_search?filter_path=aggregations.date.buckets.key_as_string,aggregations.date.buckets.max.value
{
  "size":0,
  "aggs": {
    "date": {
      "date_histogram": {
        "field": "dateRange",
        "calendar_interval": "month"
      },
      "aggs": {
        "max": {
          "max": {
            "field": "val"
          }
        }
      }
    }
  }
}

{
  "aggregations" : {
    "date" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01T00:00:00.000Z",
          "max" : {
            "value" : 2.0
          }
        },
        {
          "key_as_string" : "2022-02-01T00:00:00.000Z",
          "max" : {
            "value" : 2.5
          }
        },
        {
          "key_as_string" : "2022-03-01T00:00:00.000Z",
          "max" : {
            "value" : 2.5
          }
        },
        {
          "key_as_string" : "2022-04-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-05-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-06-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-07-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-08-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-09-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-10-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-11-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-12-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        }
      ]
    }
  }
}

Though it is not documented, the relation between date_range field and date histogram bucket range is limited to "INTERSECTS".

Range queries over range fields support relation parameter which can be one of WITHIN , CONTAINS , INTERSECTS (default). There is no such parameter in Histogram aggregation over range fields, however. (Maybe it is worth creating feature request in Github repository, if you want.)

(To check this behavior, I added ranges start/end in the middle of month.)

truptir · February 24, 2022, 5:57am

Thank you So much @Tomo_M for your help. This solved my problem:)

Tomo_M · February 24, 2022, 6:30am

I'm glad to hear my answer help you. If you satisfied with my answer, please check as Solution. Thanks!

celalsur · March 4, 2022, 6:40am

it also worked for me, thanks for the useful answers.

celalsur · March 10, 2022, 10:29am

actually i also had same issue.. .

system · April 7, 2022, 10:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Date_histogram buckets not as expected Elasticsearch	10	908	March 30, 2017
Date Histogram: Can I force the last bucket to be a full interval? Elasticsearch	4	1467	November 9, 2017
To get end date of week in date histogram Elasticsearch	4	1510	March 1, 2018
Date Histogram empty buckets and number of buckets Elasticsearch	5	2825	July 5, 2017
Elasticsearch date-histogram aggregations default buckets Elasticsearch	1	351	July 2, 2020

Date histogram aggregation over duration from startDate to endDate

Related topics