Simple aggregation counting distinct values that has turned out to be difficult

Adam_Burr · February 15, 2023, 8:26pm

I have a very simple index and I am trying to produce what I thought would be a simple aggregation, but I am finding it difficult to get working. I would be very grateful for any help the community can give.

My "sales" index has mappings as follows:-

"mappings": {
  "properties": {
    "shop": {
      "type": "keyword"
    },
    "customerid": {
      "type": "keyword"
    },
    "time": {
      "type": "date",
      "format": "epoch_second"
    }
  }
}

I am trying to analyse the data to find customers who make at least X purchases in the last Y days and do so at at least Z different shops. The results I want would give me results that had these important values:-

[{
  "customerid": 12345,
  "distinct_shops": 8,
  "total_sales": 54
},
{
  "customerid": 45678,
  "distinct_shops": 5,
  "total_sales": 101
},
{
  "customerid": 35790,
  "distinct_shops": 20,
  "total_sales": 49
}, ...]

Each of the groups of three values in the above result set would represent a different unique customer. There are millions of sales in the index and many different customers. The results should only list the customers that have been to at least the specified number of shops and made at least the specified number of purchases. And that activity has to have been done in a specified number of days.

All three of those specified parameters to the query need to be variable by the user at run time, so I don't think I can calculate the results in advance.

The counts of the distinct shops visited needs to be accurate, so the cardinality keyword in Elasticsearch doesn't help me.

I would know how to write this very simply in SQL:-

SELECT customerid, count(customerid), COUNT(DISTINCT shop) FROM sales GROUP BY customerid, shop HAVING count(customerid)>=20

But I can't see how to do it in query DSL.

Any suggestions?

Thanks

Adam

system · March 15, 2023, 8:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Count distinct by date Elasticsearch	3	833	October 28, 2019
Will elasticsearch SQL support for cardinality aggregation (DISTINCT) in future? Elasticsearch	4	550	October 19, 2018
Query DSL count distinct Elasticsearch	16	4523	April 13, 2022
Aggregation - Calculate the number of distinct values Elasticsearch	5	434	January 3, 2019
Aggregations using distinct query Elasticsearch	3	441	March 27, 2018

Simple aggregation counting distinct values that has turned out to be difficult

Related topics