Query with two aggregations

KibanaUse · April 16, 2018, 9:06am

Hello,

I am tying to write a query to first get the last state value per customer, and then counting those up.

So I have logs of customers with states. Each time a state of a customer changes, I write this to a log file. And through a query I want to be able to count how many of customer is a "available" state at the moment.

It should not count the customer, that were available once and not available any longer OR should not count the customers twice if an available customer became unavailable first and then available again.

Thanks in advance!

Mark_Harwood · April 16, 2018, 9:46am

This potentially expensive to compute if you have to consider every customer as an individual, determining the latest status for each.
Would it perhaps be possible to compute a "good-enough" stat by subtracting counts of the various states? e.g.

numAvailable = totalNumAvailableStatesRecorded - totalNumUnavailableStatesRecorded

This would only work if each customer only ever has 2 states recorded - available then unavailable.

KibanaUse · April 16, 2018, 10:41am

@Mark_Harwood Thank you for the answer. My data has more than 2 states.

You said it is expensive to compute, does this mean it is possible?? I am at the moment interested in getting the exact accurate result rather than having performance issues.

Mark_Harwood · April 16, 2018, 11:03am

My data has more than 2 states.

If the states are mutually exclusive and recorded once per customer then that sort of arithmetic may still be accurate.

You said it is expensive to compute, does this mean it is possible??

Yes, but may require you to re-think index design due to processing costs of performing distributed joins. See entity centric indexing: https://www.youtube.com/watch?v=yBf7oeJKH2Y

TimWard · April 16, 2018, 11:38am

Other people seem to have different understandings of what you're asking, but from my understanding (log lines relating customer <-> state, and you're interested in the latest such for each customer) you start by using a terms aggregation to sort the log lines into per-customer buckets, and then you use a top hits aggregation to get the latest state for each customer. So far so standard, and works fine - it can't, I hope, be very difficult to work out how to add a third aggregation to count how many you've then got in each state? Another terms aggregation into buckets by state?

KibanaUse · April 16, 2018, 11:49am

@TimWard. @Mark_Harwood to be more precise;

Here are the two screenshots of my data representation:

The first is the logs for the customers, their states and the date the log is written.

I would like to be able to count the number of customers they are currently in greens, reds and yellows state.

In this example, when I query red, I should get 1 since Ingolstadt CC turned to green later OR when I query for green, the reult should be 8 AND for yellow, the result should be 2.

So I thought about writing a query. I have only used the Kibana editor so far for my visualizations. Therefore asking if this would be possible by writing a query.

Thank you!

Mark_Harwood · April 16, 2018, 12:26pm

OK so it looks like the state transitions aren't always in the same sequence so my arithmetic suggestion wouldn't work. You need to consider each customer individually.

If you have many (eg hundreds of thousands plus) of customers then this causes the "bucket explosion" problem referenced in my video link.
This will be too much work to attempt in a single query. You can try use term partitioning to break the analysis into several smaller queries. The docs even include an example of how to get the latest value for each ID. This will not play nicely with Kibana though.
Using the entity centric example I gave the cons are a more complex indexing strategy but the pros are faster, simpler queries and Kibana-compatible indexes

system · May 14, 2018, 12:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I select the latest state of the data? Kibana	2	434	May 11, 2018
Elasticsearch query to filter count by aggregate Elasticsearch	1	2590	October 1, 2018
Count the number of occurrences for each status only considering the last registred status - Table visualization Kibana	5	2071	August 15, 2019
Complex bucket aggregation Elasticsearch	4	248	September 15, 2021
Aggregation over an aggregation Elasticsearch	3	409	April 11, 2018

Query with two aggregations

Related topics