Need help setting up some forecasting monitoring - Math guys, let's do this

So for my particular need, I am attempting to make interval forecasting based upon counts. IE, we look through the logs for a certain LogEntryPhrase, that if exists, signifies that a certain popup was successful. I want to know how many popups occurred in the last 15 minutes. And then compare that to the same interval one week ago, two weeks ago, etc.

So for Monday from 9:00 - 9:15, how does that compare to last Monday, and the Monday before that, etc. Think call center forecasting based upon counts. Unfortunately, Kibana doesn't have this capability with date histograms, which is crucial for a lot of the built in smoothing / prediction methods (ewma/holt_winters) since my interval needs to be specific time frame, on various days.

IE counts from 9:00 - 9:15 arn't very indicative of the expected counts from 9:15 - 9:30, but the 9:15-9:30 from the last few Mondays, is.

Below is my current hack method of doing it, but again, I will miss out on a ton of the smoothing techniques necessary. (How do I handle missing data / anomalies)

Any advice, or workarounds that someone has found that works for them?

{
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "now-10w"
}
}
},
"must": [Phrases I'm looking for are here ]
}
},
"aggs": {
"history": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "now-5w-15m",
"to": "now-5w"
},
{
"from": "now-4w-15m",
"to": "now-4w"
},
{
"from": "now-3w-15m",
"to": "now-3w"
},
{
"from": "now-2w-15m",
"to": "now-2w"
},
{
"from": "now-1w-15m",
"to": "now-1w"
}
]
},
"aggs": {
"my_count": {
"sum": {
"script": "1"
}
}
}
},
"my_stats": {
"extended_stats_bucket": {
"buckets_path": "history>my_count"
}
},
"today": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "now-15m",
"to": "now"
}
]
}
}
}
}

Which gives me the output, I need. It's also incredibly tedious once we increase storage

For example: Results
"Todays"
"aggregations": {
"today": {
"buckets": [
{
"key": "2017-03-02T15:10:38.439Z-2017-03-02T15:25:38.439Z",
"from": 1488467438439,
"from_as_string": "2017-03-02T15:10:38.439Z",
"to": 1488468338439,
"to_as_string": "2017-03-02T15:25:38.439Z",
"doc_count": 23
}
]
}

And then we get a few buckets that look like this, with their aggregated stats below:
{
"key": "2017-02-23T15:10:38.439Z-2017-02-23T15:25:38.439Z",
"from": 1487862638439,
"from_as_string": "2017-02-23T15:10:38.439Z",
"to": 1487863538439,
"to_as_string": "2017-02-23T15:25:38.439Z",
"doc_count": 28,
"my_count": {
"value": 28
}
}
]
},
"my_stats": {
"count": 5,
"min": 19,
"max": 28,
"avg": 24.4,
"sum": 122,
"sum_of_squares": 3023,
"variance": 9.239999999999963,
"std_deviation": 3.039736830714127,
"std_deviation_bounds": {
"upper": 24.4,
"lower": 24.4
}
}

But again, I really am missing out on smoothing techniques that handle missing/anomalies. Any ideas?

This is a tough one.

I'm not sure about smoothing methods, or how you would even integrate these in Kibana. Kibana will render your ES-results, but doesn't really do post-processing on the data. So you might try your luck in the ES-forum with your question https://discuss.elastic.co/c/elasticsearch.

a few of the cuff thoughts, wrt. smoothing:

are you talking about line-fitting, or interpolation?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.