Best practices to create dashboards with historical data indexes

PMF · May 2, 2024, 12:35pm

Hi. First time posting here, I'm also quite newbie working with ELK so plz be patient.

We are studying how to build some dashboards across an index where we are storing historical data of our notification system.

Each time any notification changes status we store a new document on our index, and every notification has a unique identifier which allow us to identify that different documents refers to the same notification.

Now... this index is perfect to study with great detail problems within specific time ranges, or within specific notifications, as we can see the complete road it followed, which fields changed, when, what values there were before, etc...

It's also a great option to create dashboards with graphs where we show the evolution in time of a metric or a value.

But we are struggling when we want to create graphs that uses aggregation, because when we got several documents for each notification on the time frame we are querying, we don't know which of the records is being aggregated.

For example, imagine we want to create a graph that shows how many notifications we have pending and how many we have succesfully delivered in a time frame, so we aggregate by status value.

If I have several different documents of notification 'X' in my time frame, some of them with 'pending' state and some with 'delivered' state, I will get several hits in each aggregated value, and if I ask my graph to use a unique count, I won't be able to know which of the valid documents is being accounted for, so I don't really know if my 'X' notification is being accounted as 'pending' or as 'delivered', as on this time frame I have both values.

This seems like something that we could fix using a second index. We can use a first index with historical data, and another one were we record just the last state (more like a relational database). This way, we could use the historical index to create dashboards and graphs that explore time distributed info, and another index just to get info about 'current' state of affairs on our notification system.

The problem here, is that although this approach would be valid to get our 'status graph' with current info, we lose the ability to query that graph to ask it how many delivered and pending notifications we have past month, or past year, as when I update a notification I lose it's previous info.

If notification 'X' is delivered today and was pending all over the last month, if I query the graph in a time frame of the last month it won't be accounted for, as there won't be any record for notification 'X' referred to last month (as it was updated today).

So... our still relational thinking minds are trying to realize how (if it's possible) would we be able to query our historical index in a time frame, but when making a unique count of elements, be able to force the graph to get the last document of each notification.

Something deep inside me is shouting that we are trying to use a big data approach to a relational problem. But in the end, we aren't trying to do anything so strange, I'm quite sure that must have a correct way to achieve what we are trying to do, just we don't realize, neither find, how to do it.

Any insight on this topic will be much appreciated. Sorry if I express myself a bit clunkily, as english is not my mother language.

Giovanni_Magni · May 2, 2024, 1:50pm

Hello Pablo, thank you for writing us. If I got this correctly there are two important things here.

First, the use of the Last Value functionality, which is implemented into Lens and that should exactly what you are looking for in term of feature.

On a separate basis, it seems to me that you would like to make this chart somehow independent from the timerange selected at the dashboard level, am I correct? So that it always shows the latest status of these notification as of today

Let me tag @Marco_Liberati which would be of great help since he has a way bigger experience than me on the subject!

Best,
Giovanni

PMF · May 3, 2024, 6:19am

Hello Giovanni.

The exact problem resides in finding a way to allow the chart to depend from the time range, but allow it to get every time the last value from those valid in the time range selected, not just the last value from all of those in the index.

PMF · May 8, 2024, 12:13pm

Ok, I think this caption explains better our problem

As you can see, I am counting the unique notification elements, and splitting them by some field value (notificacion.acuses). So, if this value is 0, the notification is yet awaiting delivery, and if it's > 0, it has been already delivered.

On the graph I'm showing three slices, Not delivered (=0), delivered (>0) and all of them (>=0) but the sum of =0 and >0 is greater than the result of >=0.

This is because, on our time frame, there are notifications which have multiple documents, some of them with value 0, and some other with value 1 or greater, so they are being counted on both slices.

What we are trying to achieve here is, that on those cases where multiple documents of the same element are present on our query time frame, we want the system to take in account just the last one.

Is that even possible? Or that's not a valid concept in elastic?
If there's any way to achieve this, how should we approach it?

Topic		Replies	Views
How to aggregate data on the dashboard from two different Indexes Kibana	4	653	March 2, 2020
Creating new dashboards with existing one by changing type/index Kibana	3	856	July 6, 2017
Kibana Dashboards for inventory tracking with deleted indexes Kibana	6	268	August 9, 2023
Single Dashboard utilization for different index Kibana	7	1490	July 6, 2017
Different dashboard from multiple indexes for different project logs Kibana	6	4514	September 13, 2019

Best practices to create dashboards with historical data indexes

Related topics