Hi,
I would like to stack two date-histograms, but I do not have a clue how to do it...
Say, it's the traffic going through an interface and I want to receive the daily or hourly peaks of data in 60 minutes.
For the peaks it is not enough to just select hourly values by block hour (11:00:00 -- 11:59:59, 12:00 -- 12:59:59) but the granularity must be finer. I need values for each 60 minutes with a step width of 10 minutes (11:10:00--12:09:59, 11:20:00--12:19:59, 11:30:00--12:29:00 and so on). So there are 144 values per day (24 hours * 6 values per hour).
In the same request, I would like to aggregate these results to the maximum values per hour/day/month. Here, for a start per hour.
Given data:
POST test/_bulk
{"index": {"_index": "test"}}
{"time": "2022-04-01T10:55:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T11:12:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T11:18:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:25:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:35:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T12:45:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T13:15:00Z", "size": 100}
{"index": {"_index": "test"}}
{"time": "2022-04-01T13:25:00Z", "size": 100}
Step 1: do a calendar_histogram with a fixed interval of 10 minutes and return the sum of size
Step 2: calculate the rolling 60 minutes sum of these value using a window function
POST /test/_search?size=0
{
"aggs": {
"sum_10mins": {
"date_histogram": {
"field": "time",
"fixed_interval": "10m"
},
"aggs": {
"size_sum": {
"sum": {
"field": "size"
}
},
"roll_size": {
"moving_fn": {
"buckets_path": "size_sum",
"window": 6,
"shift": 1,
"script": "MovingFunctions.sum(values)",
"gap_policy": "keep_values"
}
}
}
}
}
}
This returns what I expect, for each 10 minute interval the sum of all size in the previous 60 minutes:
{
"aggregations" : {
"sum_10mins" : {
"buckets" : [
{
"key_as_string" : "2022-04-01T10:50:00.000Z",
"key" : 1648810200000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"roll_size" : {
"value" : 100.0
}
},
{
"key_as_string" : "2022-04-01T11:00:00.000Z",
"key" : 1648810800000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 100.0
}
},
{
"key_as_string" : "2022-04-01T11:10:00.000Z",
"key" : 1648811400000,
"doc_count" : 2,
"size_sum" : {
"value" : 200.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T11:20:00.000Z",
"key" : 1648812000000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T11:30:00.000Z",
"key" : 1648812600000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T11:40:00.000Z",
"key" : 1648813200000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T11:50:00.000Z",
"key" : 1648813800000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 200.0
}
},
{
"key_as_string" : "2022-04-01T12:00:00.000Z",
"key" : 1648814400000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 200.0
}
},
{
"key_as_string" : "2022-04-01T12:10:00.000Z",
"key" : 1648815000000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 0.0
}
},
{
"key_as_string" : "2022-04-01T12:20:00.000Z",
"key" : 1648815600000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"roll_size" : {
"value" : 100.0
}
},
{
"key_as_string" : "2022-04-01T12:30:00.000Z",
"key" : 1648816200000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"roll_size" : {
"value" : 200.0
}
},
{
"key_as_string" : "2022-04-01T12:40:00.000Z",
"key" : 1648816800000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T12:50:00.000Z",
"key" : 1648817400000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T13:00:00.000Z",
"key" : 1648818000000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T13:10:00.000Z",
"key" : 1648818600000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"roll_size" : {
"value" : 400.0
}
},
{
"key_as_string" : "2022-04-01T13:20:00.000Z",
"key" : 1648819200000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"roll_size" : {
"value" : 400.0
}
}
]
}
}
}
Step 3: (the problem) aggregate these buckets and return the maximum of each size, this is what I expect:
{
"aggregations" : {
"sum_10mins" : {
"buckets" : [
{
"key_as_string" : "2022-04-01T10:00:00.000Z",
"key" : 1648810200000,
"doc_count" : 1,
"size_sum" : {
"value" : 100.0
},
"max_roll_size" : {
"value" : 100.0
}
},
{
"key_as_string" : "2022-04-01T11:00:00.000Z",
"key" : 1648810800000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T12:00:00.000Z",
"key" : 1648814400000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_roll_size" : {
"value" : 300.0
}
},
{
"key_as_string" : "2022-04-01T13:00:00.000Z",
"key" : 1648818000000,
"doc_count" : 0,
"size_sum" : {
"value" : 0.0
},
"roll_roll_size" : {
"value" : 400.0
}
}
]
}
}
}
I tried different places for the next aggregation, as sibling to the first date_histogram aggregation, or as a child. Here, as a sibling:
POST /test/_search?size=0
{
"aggs": {
"sum_10mins": {
"date_histogram": {
"field": "time",
"fixed_interval": "10m"
},
"aggs": {
"size_sum": {
"sum": {
"field": "size"
}
},
"roll_size": {
"moving_fn": {
"buckets_path": "size_sum",
"window": 6,
"shift": 1,
"script": "MovingFunctions.sum(values)",
"gap_policy": "keep_values"
}
}
}
},
"max_size": {
"date_histogram": {
"field": "aggregations.sum_10mins.buckets.key",
"interval": "day"
}
}
}
}
But this only returns an empty bucket.
Any ideas are welcomed.
When it just does not work in QDSL is there any other way? Including ingest pipelines to do step 1 and 2 and then only do step 3 in the QDSL request or what so ever. I am quite open to any suggestions - the goal is to get the results and show them in kibana. A single Elasticsearch request would be the preferred way. The less changes to the original docs, the better.
Regards
Sebastian