Need help setting up some forecasting monitoring - Math guys, let's do this

t.Farestad · March 2, 2017, 4:06pm

So for my particular need, I am attempting to make interval forecasting based upon counts. IE, we look through the logs for a certain LogEntryPhrase, that if exists, signifies that a certain popup was successful. I want to know how many popups occurred in the last 15 minutes. And then compare that to the same interval one week ago, two weeks ago, etc.

So for Monday from 9:00 - 9:15, how does that compare to last Monday, and the Monday before that, etc. Think call center forecasting based upon counts. Unfortunately, Kibana doesn't have this capability with date histograms, which is crucial for a lot of the built in smoothing / prediction methods (ewma/holt_winters) since my interval needs to be specific time frame, on various days.

IE counts from 9:00 - 9:15 arn't very indicative of the expected counts from 9:15 - 9:30, but the 9:15-9:30 from the last few Mondays, is.

Below is my current hack method of doing it, but again, I will miss out on a ton of the smoothing techniques necessary. (How do I handle missing data / anomalies)

Any advice, or workarounds that someone has found that works for them?

{
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "now-10w"
}
}
},
"must": [Phrases I'm looking for are here ]
}
},
"aggs": {
"history": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "now-5w-15m",
"to": "now-5w"
},
{
"from": "now-4w-15m",
"to": "now-4w"
},
{
"from": "now-3w-15m",
"to": "now-3w"
},
{
"from": "now-2w-15m",
"to": "now-2w"
},
{
"from": "now-1w-15m",
"to": "now-1w"
}
]
},
"aggs": {
"my_count": {
"sum": {
"script": "1"
}
}
}
},
"my_stats": {
"extended_stats_bucket": {
"buckets_path": "history>my_count"
}
},
"today": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "now-15m",
"to": "now"
}
]
}
}
}
}

Which gives me the output, I need. It's also incredibly tedious once we increase storage

For example: Results
"Todays"
"aggregations": {
"today": {
"buckets": [
{
"key": "2017-03-02T15:10:38.439Z-2017-03-02T15:25:38.439Z",
"from": 1488467438439,
"from_as_string": "2017-03-02T15:10:38.439Z",
"to": 1488468338439,
"to_as_string": "2017-03-02T15:25:38.439Z",
"doc_count": 23
}
]
}

And then we get a few buckets that look like this, with their aggregated stats below:
{
"key": "2017-02-23T15:10:38.439Z-2017-02-23T15:25:38.439Z",
"from": 1487862638439,
"from_as_string": "2017-02-23T15:10:38.439Z",
"to": 1487863538439,
"to_as_string": "2017-02-23T15:25:38.439Z",
"doc_count": 28,
"my_count": {
"value": 28
}
}
]
},
"my_stats": {
"count": 5,
"min": 19,
"max": 28,
"avg": 24.4,
"sum": 122,
"sum_of_squares": 3023,
"variance": 9.239999999999963,
"std_deviation": 3.039736830714127,
"std_deviation_bounds": {
"upper": 24.4,
"lower": 24.4
}
}

But again, I really am missing out on smoothing techniques that handle missing/anomalies. Any ideas?

thomasneirynck · March 2, 2017, 8:02pm

This is a tough one.

I'm not sure about smoothing methods, or how you would even integrate these in Kibana. Kibana will render your ES-results, but doesn't really do post-processing on the data. So you might try your luck in the ES-forum with your question https://discuss.elastic.co/c/elasticsearch.

a few of the cuff thoughts, wrt. smoothing:

are you talking about line-fitting, or interpolation?

For line-fitting (https://en.wikipedia.org/wiki/Curve_fitting), you'll be looking at regression methods, machine learning techniques. There is no easy answer there.
For interpolation, you might get a long way with more basic geometric interpolations, most likely polynomial splines (https://en.wikipedia.org/wiki/Spline_(mathematics). Those work reasonably well for time-series data.

system · March 30, 2017, 8:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help with forecasting based on counts in a specific interval - Math guys, let's do this Elasticsearch	3	577	March 31, 2017
Date histograms -- fixed intervals possible? Kibana	3	840	August 26, 2020
Date histogram that doesn't look at the entire date bucket? Kibana	1	155	November 9, 2022
Kibana AVG last hour Kibana	4	827	March 16, 2018
Forecasts on a dashboard Kibana	3	636	February 15, 2022

Need help setting up some forecasting monitoring - Math guys, let's do this

Related topics