Aggregation with unique count and sum: how to do it?

simonlucalandi · May 26, 2020, 9:34pm

Hello.
We have some documents in ES (7.7.0) that represent some kind of ecommerce sales transactions, that can be described as the following table

| tx_id |  item      | cost | price |
-------------------------------------
| 1     |   item1    | 10   | 15    |
| 2     |   item2    | 15   | 20    | 
| 3     |   item3    | 20   | 30    |
| 1     |   item1    | 10   | 15    |
| 4     |   item4    | 20   | 25    |
| 5     |   item2    | 15   | 20    |
| 1     |   item1    | 10   | 15    |
| 6     |   item2    | 15   | 20    |
| 2     |   item2    | 15   | 20    | 
| 7     |   item1    | 10   | 15    |

For reasons that are too long to explains, some of the documents contains "duplicated transactions".
In this example data table, we have 3 documents with the same transaction_id=1 and 2 documents with the same transaction_id=2.
We can't change the application that writes the transactions in the documents to avoid the duplication, so we have to live with that constraint.

What we are trying to achieve, is a consolidated report with the following data:

| item  | unit_cost | unit_price | sold units | cost | revenue |
----------------------------------------------------------------
| item1 |    10     |      15    |    2       | 20   |  30     |  <-- tx_id 1, 7, but tx_id=1 appears 3 times in the data table
| item2 |    15     |      20    |    3       | 45   |  60     |  <-- tx_id 2, 5, 6, but tx_id=2 appears 2 times in the data table
| item3 |    20     |      30    |    1       | 20   |  30     |  <-- tx_id 3
| item4 |    20     |      25    |    1       | 20   |  25     |  <-- tx_id 4

We are struggling to get this aggregated table in kibana.
We tried to create terms buckets for the fields item, cost and price, and calculating the metrics unique count of tx_id to get the sold_units, but then how to calculate the metrics for the cost and revenue?
Using the metric sum will not work, because the sum of the bucket item:item1 --> unit_cost:10 --> unit_price:15 contains 4 documents, because of the duplication of tx_id=1

Any hints or suggestions?

Thanks.

SLL

myasonik · May 26, 2020, 11:16pm

Hey @simonlucalandi! I'm not 100% certain if Kibana can produce something exactly like that but I think we can get close...

Can you can create a terms aggregation on your tx_id? Just trying to follow along to this previous issue I found: Sum of unique ids

simonlucalandi · May 28, 2020, 7:54am

Hello!
Thank you for the suggestion.

I'm now trying to use the "transform data" feature to create a pivot that removes the duplicated transaction (using "Max" aggregation) that save the clean data in a new index, and then create visualization on this new index.

It seams to be fit our needs.

system · June 25, 2020, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sum aggregation along with Unique Count aggregation Kibana	5	5401	July 6, 2017
How to remove duplication when aggregating data for visualization? Kibana visualisation	1	255	March 1, 2023
Counting the duplicates and non-duplicates of a count aggregation (revisited) Kibana	3	603	November 4, 2020
Treating duplicate key values as one Kibana	2	531	November 11, 2019
Aggregation: unique count grouped by combination of field value Kibana	3	286	November 24, 2021

Aggregation with unique count and sum: how to do it?

Related topics