Standard Deviations, Revisited

gardanni · March 30, 2020, 6:12pm

I'm going down one rabbit hole after another. I wonder if someone would offer some input. I have hourly data that I want to aggregate by Date and Count. Based on this data, I want to establish another column showing how many Standard Deviations the Count is away from the average. The average of the data below is 99.4, and the standard deviation is 17.4.

I've explored DataTables, TimeLion, and VisualBuilder. It seems that VisualBuilder allows some sort of sibling pipeline Standard Deviations... but I am at a loss as to how to get it to work for my data - though at this point I don't think it will! Any insight would be greatly appreciated.

Date	Count	SDs
2/1/2020	119	1.1
2/2/2020	105	0.3
2/3/2020	109	0.6
2/4/2020	89	-0.6
2/5/2020	75	-1.4

gardanni · March 31, 2020, 2:41pm

And perhaps an easier question... Could this be done if I had a data set that consisted of exactly one record per day - so that the initial aggregation by day wouldn't be needed?

Nathan_Reese · March 31, 2020, 3:40pm

I have hourly data that I want to aggregate by Date and Count

What does your document structure look like? Do your documents already contain aggregated data? Why not just store all data in ES?

Here is some reference material on pipeline aggregations

gardanni · March 31, 2020, 3:46pm

The documents contain labels, timestamps and volumes... pretty much as shown in the table. The data is hourly. That is all stored in ES. The standard deviation would need to be calculated - it pertains to the average Count over the selected timeframe. And the desired "SDs" column would calculate the number of Standard Deviations a particular day's Count was from the average timeframe of that period.

I suggested that a separate "daily" data set might be established if calculating standard deviations based on aggregating counts per day was not plausible.

Does this answer your question? I may be missing some nuance of what you're asking - this is all very new to me.

Many thanks for your interest!

gardanni · April 1, 2020, 4:45pm

I looked at the reference info on pipeline aggregations, and come to the following conclusions...
What I want to do can't be done through the graphic user interface, but might be doable via elasticsearch scripting. However, the code in the aggregation reference you pointed to is beyond my comprehension.

I've scoured the web, and am coming up blank... Do you know if there is noob-friendly reference material on Aggregations 101 that includes discussion and examples? To a massive degree, the the documentation I've seen on the elastic.co site is super-technical, and short of the sort of examples/discussion/explanations that might allow me to advance my understanding.

Thanks again.

system · April 29, 2020, 4:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data Table Visualizations - Newbie wrestling with Null Values & Counts Kibana	4	1370	April 8, 2020
Calculate the moving standard deviation of count on a histogram aggregation Elasticsearch	1	928	July 5, 2017
Visualizing Average with Removing Outliers Kibana vega	5	926	July 15, 2021
Metrics and Time-series Analysis Megathread Elasticsearch	3	1131	July 5, 2017
How to calculate the average count per unique term Kibana	9	7958	November 29, 2019

Standard Deviations, Revisited

Related topics