QDSL Query: date_histogram aggregating the results of a date_histogram

Hi,

I would like to stack two date-histograms, but I do not have a clue how to do it...

Say, it's the traffic going through an interface and I want to receive the daily or hourly peaks of data in 60 minutes.

For the peaks it is not enough to just select hourly values by block hour (11:00:00 -- 11:59:59, 12:00 -- 12:59:59) but the granularity must be finer. I need values for each 60 minutes with a step width of 10 minutes (11:10:00--12:09:59, 11:20:00--12:19:59, 11:30:00--12:29:00 and so on). So there are 144 values per day (24 hours * 6 values per hour).

In the same request, I would like to aggregate these results to the maximum values per hour/day/month. Here, for a start per hour.

Given data:

POST test/_bulk
{"index": {"_index": "test"}}
{"time": "2022-04-01T10:55:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T11:12:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T11:18:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:25:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:35:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:45:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T13:15:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T13:25:00Z", "size": 100}

Step 1: do a calendar_histogram with a fixed interval of 10 minutes and return the sum of size
Step 2: calculate the rolling 60 minutes sum of these value using a window function

POST /test/_search?size=0
{
  "aggs": {
    "sum_10mins": {
      "date_histogram": {
        "field": "time",
        "fixed_interval": "10m"
      },
      "aggs": {
        "size_sum": {
          "sum": {
            "field": "size"
          }
        },
        "roll_size": {
          "moving_fn": {
            "buckets_path": "size_sum",
            "window": 6,
            "shift": 1,
            "script": "MovingFunctions.sum(values)",
            "gap_policy": "keep_values"
          }
        }
      }
    }
  }
}

This returns what I expect, for each 10 minute interval the sum of all size in the previous 60 minutes:

{  
  "aggregations" : {
    "sum_10mins" : {
      "buckets" : [
        {
          "key_as_string" : "2022-04-01T10:50:00.000Z",
          "key" : 1648810200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:00:00.000Z",
          "key" : 1648810800000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:10:00.000Z",
          "key" : 1648811400000,
          "doc_count" : 2,
          "size_sum" : {
            "value" : 200.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:20:00.000Z",
          "key" : 1648812000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:30:00.000Z",
          "key" : 1648812600000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:40:00.000Z",
          "key" : 1648813200000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:50:00.000Z",
          "key" : 1648813800000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 200.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:00:00.000Z",
          "key" : 1648814400000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 200.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:10:00.000Z",
          "key" : 1648815000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:20:00.000Z",
          "key" : 1648815600000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:30:00.000Z",
          "key" : 1648816200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 200.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:40:00.000Z",
          "key" : 1648816800000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T12:50:00.000Z",
          "key" : 1648817400000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T13:00:00.000Z",
          "key" : 1648818000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_size" : {
            "value" : 300.0
          }
        },
        {
          "key_as_string" : "2022-04-01T13:10:00.000Z",
          "key" : 1648818600000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 400.0
          }
        },
        {
          "key_as_string" : "2022-04-01T13:20:00.000Z",
          "key" : 1648819200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "roll_size" : {
            "value" : 400.0
          }
        }
      ]
    }
  }
}

Step 3: (the problem) aggregate these buckets and return the maximum of each size, this is what I expect:

{  
  "aggregations" : {
    "sum_10mins" : {
      "buckets" : [
        {
          "key_as_string" : "2022-04-01T10:00:00.000Z",
          "key" : 1648810200000,
          "doc_count" : 1,
          "size_sum" : {
            "value" : 100.0
          },
          "max_roll_size" : {
            "value" : 100.0
          }
        },
        {
          "key_as_string" : "2022-04-01T11:00:00.000Z",
          "key" : 1648810800000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_roll_size" : {
            "value" : 300.0
          }
        },        
        {
          "key_as_string" : "2022-04-01T12:00:00.000Z",
          "key" : 1648814400000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_roll_size" : {
            "value" : 300.0
          }
        },        
        {
          "key_as_string" : "2022-04-01T13:00:00.000Z",
          "key" : 1648818000000,
          "doc_count" : 0,
          "size_sum" : {
            "value" : 0.0
          },
          "roll_roll_size" : {
            "value" : 400.0
          }
        }        
      ]
    }
  }
}

I tried different places for the next aggregation, as sibling to the first date_histogram aggregation, or as a child. Here, as a sibling:

POST /test/_search?size=0
{
  "aggs": {
    "sum_10mins": {
      "date_histogram": {
        "field": "time",
        "fixed_interval": "10m"
      },
      "aggs": {
        "size_sum": {
          "sum": {
            "field": "size"
          }
        },
        "roll_size": {
          "moving_fn": {
            "buckets_path": "size_sum",
            "window": 6,
            "shift": 1,
            "script": "MovingFunctions.sum(values)",
            "gap_policy": "keep_values"
          }
        }
      }
    }, 
    "max_size": {
      "date_histogram": {
        "field": "aggregations.sum_10mins.buckets.key",
        "interval": "day"
      }
    }
  }
}

But this only returns an empty bucket.

Any ideas are welcomed.

When it just does not work in QDSL is there any other way? Including ingest pipelines to do step 1 and 2 and then only do step 3 in the QDSL request or what so ever. I am quite open to any suggestions - the goal is to get the results and show them in kibana. A single Elasticsearch request would be the preferred way. The less changes to the original docs, the better.

Regards
Sebastian

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.