Hello.
We have some documents in ES (7.7.0) that represent some kind of ecommerce sales transactions, that can be described as the following table
| tx_id | item | cost | price |
-------------------------------------
| 1 | item1 | 10 | 15 |
| 2 | item2 | 15 | 20 |
| 3 | item3 | 20 | 30 |
| 1 | item1 | 10 | 15 |
| 4 | item4 | 20 | 25 |
| 5 | item2 | 15 | 20 |
| 1 | item1 | 10 | 15 |
| 6 | item2 | 15 | 20 |
| 2 | item2 | 15 | 20 |
| 7 | item1 | 10 | 15 |
For reasons that are too long to explains, some of the documents contains "duplicated transactions".
In this example data table, we have 3 documents with the same transaction_id=1 and 2 documents with the same transaction_id=2.
We can't change the application that writes the transactions in the documents to avoid the duplication, so we have to live with that constraint.
What we are trying to achieve, is a consolidated report with the following data:
| item | unit_cost | unit_price | sold units | cost | revenue |
----------------------------------------------------------------
| item1 | 10 | 15 | 2 | 20 | 30 | <-- tx_id 1, 7, but tx_id=1 appears 3 times in the data table
| item2 | 15 | 20 | 3 | 45 | 60 | <-- tx_id 2, 5, 6, but tx_id=2 appears 2 times in the data table
| item3 | 20 | 30 | 1 | 20 | 30 | <-- tx_id 3
| item4 | 20 | 25 | 1 | 20 | 25 | <-- tx_id 4
We are struggling to get this aggregated table in kibana.
We tried to create terms
buckets for the fields item
, cost
and price
, and calculating the metrics unique count of tx_id
to get the sold_units
, but then how to calculate the metrics for the cost
and revenue
?
Using the metric sum
will not work, because the sum of the bucket item:item1 --> unit_cost:10 --> unit_price:15
contains 4 documents, because of the duplication of tx_id=1
Any hints or suggestions?
Thanks.
SLL