Which Visualization works best for computing averages?


(Ted W. Stapenhorst) #1

Hello,

I'm bring into Logstash student attendance data (Days Enrolled - EN, and Days Present - PR). I want to sum up all the EN days for every student at each school, sum up all PR days for every student at each school, and divide PR by EN (PR/EN) to get the Average Daily Attendance at each school and for each grade at that school. Which visualization works the best for computing averages? Do you have some examples of how I could do this?

Thank you in advance.
Ted.


(CJ Cenizal) #2

Hi Ted, what does one of your documents look like? I imagine it would be something like this:

{
  school: "A school name or ID",
  grade: 9,
  student: "A student name or ID",
  days_enrolled: 304,
  days_present: 139,
}

And then it sounds like you want to create two types of Bar Charts: one which has average attendance on the Y axis and school on the X axis, and another which has average attendance on the Y axis and grade broken down per school on the X axis.

You can use the Visualize app to do this, but first you need to add some information using our Scripted Fields feature. Visit Management > Index Patterns, and then click the "Scripted Fields" tab. Here you'll need to create a new Scripted Field. We want to use this scripted field to calculate the average attendance for each document for us. Name your field something which makes sense to you, and then specify the script to be something like this:

if (doc['days_enrolled'].value == 0) {
  return 0;
}

return doc['days_present'].value / doc['days_enrolled'].value;

The first if condition will protect against a divide-by-zero error in the event a student hasn't enrolled for any duration of time. The final return value is the PR/EN average attendance you're looking for, but only per document.

Next we need to define a visualization which will use this scripted field to show us the average attendance per school and per grade within each school.

Go to the Visualize app and create a new Bart Chart. Under "Metrics" define your Y-Axis with an Aggregation of "Average" and select your scripted field from the Field dropdown. Under "Buckets", define your X-Axis with an Aggregation of "Terms" and select the "school" field (or whatever your school field is named) from the dropdown. Hit the play button at the top to create the visualization. You should now see a Bar Chart showing you the average attendance per school. You can change the Size value to show more or fewer schools.

Now you can change this to also show grades per school. Change the field for this X-Axis to "grade" (or the equivalent field in your documents). Then click the "Add sub-buckets" button, select "Split chart" and click the "Columns" button beneath the new split chart you've added. Set the Sub-aggregation to "Terms" and set Field to "school". Hit the play button again and you should now see the grades split up into each school.

Let me know if you need any more help!

Thanks,
CJ


(Ted W. Stapenhorst) #3

Hi CJ,

Thank you for your replay. Yes, my data looks like the following:

SC,ID,FN,LN,EN,PR
VV,9009999,Billy,Smith,107,105

I will further investigate your solution, and I'll provide you some feedback. Thank you again.

Ted.


(Ted W. Stapenhorst) #4

Hi CJ,

If I'm bring in about 12,500 summary totals on all students everyday (the database adds up all the totals within the application once a day at 12:30 AM with the previous day's attendance and stores them in an ATTHIST table), what kind of performance hit on the cluster will I see if my end users want to look at the Average Daily Attendance (ADA) on any day within the school year? I will only be sending new data into ElasticSearch just once a day. Will the averages (ADA) be recalculated every time the user selects a different day? Thank you in advance.

Ted.


(CJ Cenizal) #5

Hi Ted, yes the averages are calculated every time you execute the Elasticsearch query, i.e. every time that Bar Chart is refreshed. You can select an auto-refresh interval in the top right corner if you'd like to automatically refresh it on an interval.

In terms of performance, are you asking about performance at index-time (when your 12,500 documents are indexed) or query-time (when the Bar Chart is viewed)?

CJ


(Ted W. Stapenhorst) #6

Hi CJ,

Both. Since the 12,500 documents are going to be coming in at 1:00 AM, which is not a big deal at that time of day. I'm more concerned about the 35-50 end users that could look at the visualization at anytime between 7:00 AM and 4:00 PM on a daily basis. I'm assuming the bigger hit on the cluster is when the end users want to see what the yearly ADA is on a given day. My end users are going to come back to me and ask if I can give them the ADA for just one day, a week, or any span of time they want. That is another part of this project that I will have to deal with, but that is down the road. Thank you in advance.

Ted.


(CJ Cenizal) #7

It's tough to say ahead of time what your index-time performance would be like. I'd read over our performance tuning tips and follow the progressive approaches outlined there to make sure you're able to identify and address bottlenecks as they arise.

In terms of query performance, again it's tough to say though I can say that the scripted field will definitely impact performance (though whether this will be noticeable is unknown). You can remove the need for scripted fields by using Logstash to calculate the ADA per document and store it as a field on that document. This will definitely speed up the query.

CJ


(Ted W. Stapenhorst) #8

Hi CJ,

Thank you for the answers.

Ted.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.