Kibana visualization - Average of sum of grouped values

Hello,

i'm trying to achieve the following result using Kibana visualizations (Lens or others).

I have an index where each document is a service delivery to a user.
The documents have a cost and a user_id.

I want two visualizations:

  1. A chart of average costs by month, and this is easily achieved using Lens.
  2. A chart of average costs by user by month. This logically would need to:
  • group service deliveries by user
  • sum the costs of the documents grouped together
  • average the costs
  • plot these by month

Is it possible to achieve 2) with any visualization tool in Kibana?
I guess it's probably doable using Vega, but i'd avoid it if possible.

Thanks

For visualization 2, you should be able to do this with Lens using the collapse by functionality (basically a group-by + a sum).

To do this, add a breakdown dimension on user_id and configure it like so:

Yes, achieving the second visualization (average costs by user by month) in Kibana is possible using the Aggregation features in Lens. You can follow these steps:

  1. Create a New Lens Visualization: Start by creating a new Lens visualization in Kibana.
  2. Select Date Histogram: Drag and drop the date field (representing the service delivery date) to the "Date" field well. This will create a date histogram.
  3. Split Chart by User: Drag and drop the user_id field to the "Break down by" field well. This will split the chart by user.
  4. Aggregate Costs: Drag and drop the cost field to the "Values" field well. By default, it might use "Sum" as the aggregation. Change it to "Average" to get the average cost per user per month.
  5. Adjust Labels and Formatting: You may want to adjust the labels, axis names, and other formatting options to make the visualization more readable.
  6. Save the Visualization: Once you're satisfied with the configuration, save the visualization.

Thank you both for you kind answers.
Unfortunately, i'm still not able to achieve the desired result.

Regarding @Andrew_Tate proposal, i've configured a drilldown on the user_id.
If i set average of cost on Y-axis, without collapse option, I get this:

Then, if I set collapse by Sum, the result is the sum of averages, not averages of sums as wanted. (I cannot put another image being a new user; the result is, on the column for 2023-05-23, the number 468, the sum of the average values).

Even if I switch the functions sum and average in drilldown and Y-axis, I'm still getting something else.
Moreover, the fact that I have to set a number of values feels a little bit weird. What if the maximum number isn't enough?

About @nextgen suggestion, following your instructions I end up with each bar split with user_id reference.
I see where you're coming from; my bad for not explaining it clearly enough.
What i actually need is to sum the costs for each user, so that I get the total cost for a user (composed of different services). Then, I want to average them, to get the average cost by user.

I'm probably missing something, I feel like this shouldn't be that difficult to do.
I'm thinking that maybe I could create another index with this information better organized to make it easier to create the visualization?

To be as clear as i can, what I want to achieve is equivalent to this SQL query (which I'm using to check the numerical results of my tests on Kibana):


 SELECT
 	AVG(user_cost),
 	YEAR,
 	month
 FROM (
	 SELECT 
	 	SUM(cost) AS user_cost,
	 	EXTRACT (YEAR FROM service_timestamp) AS year,
	 	EXTRACT (MONTH FROM service_timestamp) AS MONTH
	 FROM service_deliveries
	 	WHERE cost IS NOT NULL
	 	GROUP BY user_id, YEAR, MONTH 
	 	ORDER BY YEAR DESC, MONTH DESC
) DATA
GROUP BY YEAR, MONTH
ORDER BY YEAR DESC, MONTH DESC

Thanks a lot for your support!

@Gianfranco_Demarco if you want the average of sums instead of the sum of the averages, you can just swap the operations.

Specifically

  • change your metric dimension to use the Lens Sum function
  • change your breakdown dimension collapse by setting to "average"

I hear you. To be clear, Lens uses the terms aggregation under the hood.

Elasticsearch is a distributed system that is built to deal with vast amounts of data. Specifying a number of values helps bound the amount of work that is being requested. Without controls like this, you could inadvertently set off a massive amount of work on high-cardinality data sets.

I think that's the reason it works this way.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.