Aggregation with unique count and sum: how to do it?

Hello.
We have some documents in ES (7.7.0) that represent some kind of ecommerce sales transactions, that can be described as the following table

| tx_id |  item      | cost | price |
-------------------------------------
| 1     |   item1    | 10   | 15    |
| 2     |   item2    | 15   | 20    | 
| 3     |   item3    | 20   | 30    |
| 1     |   item1    | 10   | 15    |
| 4     |   item4    | 20   | 25    |
| 5     |   item2    | 15   | 20    |
| 1     |   item1    | 10   | 15    |
| 6     |   item2    | 15   | 20    |
| 2     |   item2    | 15   | 20    | 
| 7     |   item1    | 10   | 15    | 

For reasons that are too long to explains, some of the documents contains "duplicated transactions".
In this example data table, we have 3 documents with the same transaction_id=1 and 2 documents with the same transaction_id=2.
We can't change the application that writes the transactions in the documents to avoid the duplication, so we have to live with that constraint.

What we are trying to achieve, is a consolidated report with the following data:

| item  | unit_cost | unit_price | sold units | cost | revenue |
----------------------------------------------------------------
| item1 |    10     |      15    |    2       | 20   |  30     |  <-- tx_id 1, 7, but tx_id=1 appears 3 times in the data table
| item2 |    15     |      20    |    3       | 45   |  60     |  <-- tx_id 2, 5, 6, but tx_id=2 appears 2 times in the data table
| item3 |    20     |      30    |    1       | 20   |  30     |  <-- tx_id 3
| item4 |    20     |      25    |    1       | 20   |  25     |  <-- tx_id 4

We are struggling to get this aggregated table in kibana.
We tried to create terms buckets for the fields item, cost and price, and calculating the metrics unique count of tx_id to get the sold_units, but then how to calculate the metrics for the cost and revenue?
Using the metric sum will not work, because the sum of the bucket item:item1 --> unit_cost:10 --> unit_price:15 contains 4 documents, because of the duplication of tx_id=1

Any hints or suggestions?

Thanks.

SLL

Hey @simonlucalandi! I'm not 100% certain if Kibana can produce something exactly like that but I think we can get close...

Can you can create a terms aggregation on your tx_id? Just trying to follow along to this previous issue I found: Sum of unique ids

Hello!
Thank you for the suggestion.

I'm now trying to use the "transform data" feature to create a pivot that removes the duplicated transaction (using "Max" aggregation) that save the clean data in a new index, and then create visualization on this new index.

It seams to be fit our needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.