QDSL Query: date_histogram aggregating the results of a date_histogram

sebneu · April 14, 2022, 9:29am

Hi,

I would like to stack two date-histograms, but I do not have a clue how to do it...

Say, it's the traffic going through an interface and I want to receive the daily or hourly peaks of data in 60 minutes.

For the peaks it is not enough to just select hourly values by block hour (11:00:00 -- 11:59:59, 12:00 -- 12:59:59) but the granularity must be finer. I need values for each 60 minutes with a step width of 10 minutes (11:10:00--12:09:59, 11:20:00--12:19:59, 11:30:00--12:29:00 and so on). So there are 144 values per day (24 hours * 6 values per hour).

In the same request, I would like to aggregate these results to the maximum values per hour/day/month. Here, for a start per hour.

Given data:

POST test/_bulk
{"index": {"_index": "test"}}
{"time": "2022-04-01T10:55:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T11:12:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T11:18:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:25:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:35:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:45:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T13:15:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T13:25:00Z", "size": 100}

Step 1: do a calendar_histogram with a fixed interval of 10 minutes and return the sum of size
Step 2: calculate the rolling 60 minutes sum of these value using a window function

POST /test/_search?size=0
{
  "aggs": {
    "sum_10mins": {
      "date_histogram": {
        "field": "time",
        "fixed_interval": "10m"
      },
      "aggs": {
        "size_sum": {
          "sum": {
            "field": "size"
          }
        },
        "roll_size": {
          "moving_fn": {
            "buckets_path": "size_sum",
            "window": 6,
            "shift": 1,
            "script": "MovingFunctions.sum(values)",
            "gap_policy": "keep_values"
          }
        }
      }
    }
  }
}

This returns what I expect, for each 10 minute interval the sum of all size in the previous 60 minutes:

{  
  "aggregations" : {
    "sum_10mins" : {
      "buckets" : [
        {
          "key_as_string" : "2022-04-01T10:50:00.000Z",
          "key" : 1648810200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:00:00.000Z",
          "key" : 1648810800000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:10:00.000Z",
          "key" : 1648811400000,
          "doc_count" : 2,
          "size_sum" : {
            "value" : 200.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:20:00.000Z",
          "key" : 1648812000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:30:00.000Z",
          "key" : 1648812600000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:40:00.000Z",
          "key" : 1648813200000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:50:00.000Z",
          "key" : 1648813800000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 200.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:00:00.000Z",
          "key" : 1648814400000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 200.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:10:00.000Z",
          "key" : 1648815000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:20:00.000Z",
          "key" : 1648815600000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:30:00.000Z",
          "key" : 1648816200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 200.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:40:00.000Z",
          "key" : 1648816800000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:50:00.000Z",
          "key" : 1648817400000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T13:00:00.000Z",
          "key" : 1648818000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T13:10:00.000Z",
          "key" : 1648818600000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 400.0
          }
        },
        {
          "key_as_string" : "2022-04-01T13:20:00.000Z",
          "key" : 1648819200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 400.0
          }
        }
      ]
    }
  }
}

Step 3: (the problem) aggregate these buckets and return the maximum of each size, this is what I expect:

{  
  "aggregations" : {
    "sum_10mins" : {
      "buckets" : [
        {
          "key_as_string" : "2022-04-01T10:00:00.000Z",
          "key" : 1648810200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "max_roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:00:00.000Z",
          "key" : 1648810800000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_roll_size" : {
            "value" : 300.0
          }
        },        
        {
          "key_as_string" : "2022-04-01T12:00:00.000Z",
          "key" : 1648814400000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_roll_size" : {
            "value" : 300.0
          }
        },        
        {
          "key_as_string" : "2022-04-01T13:00:00.000Z",
          "key" : 1648818000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_roll_size" : {
            "value" : 400.0
          }
        }        
      ]
    }
  }
}

I tried different places for the next aggregation, as sibling to the first date_histogram aggregation, or as a child. Here, as a sibling:

POST /test/_search?size=0
{
  "aggs": {
    "sum_10mins": {
      "date_histogram": {
        "field": "time",
        "fixed_interval": "10m"
      },
      "aggs": {
        "size_sum": {
          "sum": {
            "field": "size"
          }
        },
        "roll_size": {
          "moving_fn": {
            "buckets_path": "size_sum",
            "window": 6,
            "shift": 1,
            "script": "MovingFunctions.sum(values)",
            "gap_policy": "keep_values"
          }
        }
      }
    }, 
    "max_size": {
      "date_histogram": {
        "field": "aggregations.sum_10mins.buckets.key",
        "interval": "day"
      }
    }
  }
}

But this only returns an empty bucket.

Any ideas are welcomed.

When it just does not work in QDSL is there any other way? Including ingest pipelines to do step 1 and 2 and then only do step 3 in the QDSL request or what so ever. I am quite open to any suggestions - the goal is to get the results and show them in kibana. A single Elasticsearch request would be the preferred way. The less changes to the original docs, the better.

Regards
Sebastian

system · May 12, 2022, 9:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Return date_histogram in hour for each day Elasticsearch	2	650	May 16, 2018
Stats aggregation of a date_histogram Elasticsearch	4	409	July 6, 2017
Pipeline aggregations: apply Histogram over Terms results Elasticsearch	1	414	July 5, 2017
Aggregations across values returned by term then date histogram Elasticsearch	3	411	July 6, 2017
Use Date histogram interval in calculation? Elasticsearch	2	987	July 6, 2017

QDSL Query: date_histogram aggregating the results of a date_histogram

Related topics