Date histogram aggregation over duration from startDate to endDate

Hi, I m trying to create an elastic query that should return all fields in each bucket using date histogram builder.

I have an index that has thresholdValue with a start date and end date of 2/3 months period. Now I want to retrieve thresholdValue in each month bucket.

elastic index doc:

{
"type":"GEN"
"startDate":"2022-01-01",
"endDate":"2022-02-28",
"thresholdValue":"1"
},
{
"type":"GEN"
"startDate":"2022-03-01",
"endDate":"2022-05-31",
"thresholdValue":"1.5"
}

Query:

POST /rating/_search
{
  "aggs": {
    "thresholdValue": {
      "date_histogram": {
        "field": "endDate",
        "format": "yyyy-MM-dd",
        "time_zone": "+05:30",
        "calendar_interval": "1M",
        "offset": 0,
        "offset": 0,
        "order": {
          "_key": "asc"
        },
        "keyed": false,
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01",
          "max": "2022-03-31"
        }
      },
      "aggregations": {
        "thresholdValue": {
          "sum": {
            "field": "thresholdValue"
          }
        }
      }
    }
  }
}

Result I m getitng is:

"aggregations" : {
    "thresholdValue" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01",
          "key" : 1640975400000,
          "doc_count" : 5,
          "thresholdValue" : {
            "value" : 1.0
          }
        },
        {
          "key_as_string" : "2022-02-01",
          "key" : 1643653800000,
          "doc_count" : 0,
          "thresholdValue" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-03-01",
          "key" : 1646073000000,
          "doc_count" : 0,
          "thresholdValue" : {
            "value" : 1.5
          }
        }
      ]
    }
  }
}

My expecting result is 2nd bucket should also have thresholdValue 1.0 and the 4th bucket should have a value 1.5 as it matches under start and end date. but it is considering value in the 1st bucket only.

Thanks:)

Date histogram buckets include start but not include end.
As it is histogram, each documents are divided into some single buckets. They shoud not be duplicated.

Hi @Tomo_M, Thanks for your quick reply. Is there any alternative to achieve this.

you can use Date range define your own date range

1 Like

such as :

POST test/_search?size=0
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",
        "ranges": [
          {
            "from": "2015-07-30T23:33:09",
            "to": "2015-08-30T23:33:09"
          },
          {
            "from": "2015-07-23T23:33:09",
            "to": "2015-08-31T23:33:09"
          },
          {
            "from": "2015-08-02T23:33:09",
            "to": "2015-09-30T23:33:09"
          }
        ]
      }
    }
  }
}

Date range aggregation
A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in Date Math expressions, and it is also possible to specify a date format by which the from and to response fields will be returned. Note that this aggregation includes the from value and excludes the to value for each range.
Date range aggregation | Elasticsearch Guide [8.11] | Elastic

The document says Date range aggregation also excludes to value. It is the same as Range aggregation.

I suppose using Filters aggregation with Range query is better. In range query, you can use gte and lte to include the limit values.

Or it is possible using something like {to: 2015-08-30T00:00:00.001} with Date range aggregation.

I think I made the question complicated. My requirement is not only to include the end date.

I have an index doc that has a threshold data time period-wise.

{
StartDate: 2022-01-01
endDate: 2022-03-31
Threshold: 1.5
}
{
StartDate: 2022-04-01
endDate: 2022-12-31
Threshold: 2.7
}

Now I want to retrieve the threshold in monthly or weekly buckets.

{
bucket:[
2022-01-01:{
Threshold: 1.5
},
2022-02-01:{
Threshold: 1.5
},
2022-03-01:{
Threshold: 1.5
},
2022-04-01:{
Threshold: 2.7
},
2022-05-01:{
Threshold: 2.7
}
]
}

I got what you want, but it is a bit far from date histogram aggregation on date fields.
It is not only aggregation but it need flattening or some reconstructing the data.

For such case, Range field type is a possible option.

PUT /test_date_range/
{
  "mappings": {
    "properties": {
      "dateRange":{
        "type":"date_range"
      },
      "val": {"type":"float"}
    }
  }
}

POST test_date_range/_bulk
{"index":{}}
{"val":1.5,"dateRange":{"gte":"2022-01-01","lte":"2022-03-31"}}
{"index":{}}
{"val":2.7,"dateRange":{"gte":"2022-04-01","lte":"2022-12-31"}}
{"index":{}}
{"val":2.0,"dateRange":{"gte":"2022-01-01","lte":"2022-02-15"}}
{"index":{}}
{"val":2.5,"dateRange":{"gte":"2022-02-15","lte":"2022-12-31"}}

GET test_date_range/_search

GET test_date_range/_search?filter_path=aggregations.date.buckets.key_as_string,aggregations.date.buckets.max.value
{
  "size":0,
  "aggs": {
    "date": {
      "date_histogram": {
        "field": "dateRange",
        "calendar_interval": "month"
      },
      "aggs": {
        "max": {
          "max": {
            "field": "val"
          }
        }
      }
    }
  }
}
{
  "aggregations" : {
    "date" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01T00:00:00.000Z",
          "max" : {
            "value" : 2.0
          }
        },
        {
          "key_as_string" : "2022-02-01T00:00:00.000Z",
          "max" : {
            "value" : 2.5
          }
        },
        {
          "key_as_string" : "2022-03-01T00:00:00.000Z",
          "max" : {
            "value" : 2.5
          }
        },
        {
          "key_as_string" : "2022-04-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-05-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-06-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-07-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-08-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-09-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-10-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-11-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        },
        {
          "key_as_string" : "2022-12-01T00:00:00.000Z",
          "max" : {
            "value" : 2.700000047683716
          }
        }
      ]
    }
  }
}

Though it is not documented, the relation between date_range field and date histogram bucket range is limited to "INTERSECTS".

Range queries over range fields support relation parameter which can be one of WITHIN , CONTAINS , INTERSECTS (default). There is no such parameter in Histogram aggregation over range fields, however. (Maybe it is worth creating feature request in Github repository, if you want.)

(To check this behavior, I added ranges start/end in the middle of month.)

Thank you So much @Tomo_M for your help. This solved my problem:)

1 Like

I'm glad to hear my answer help you. If you satisfied with my answer, please check as Solution. Thanks!

it also worked for me, thanks for the useful answers.

1 Like

actually i also had same issue.. .

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.