Simple aggregation counting distinct values that has turned out to be difficult

I have a very simple index and I am trying to produce what I thought would be a simple aggregation, but I am finding it difficult to get working. I would be very grateful for any help the community can give.

My "sales" index has mappings as follows:-

"mappings": {
  "properties": {
    "shop": {
      "type": "keyword"
    },
    "customerid": {
      "type": "keyword"
    },
    "time": {
      "type": "date",
      "format": "epoch_second"
    }
  }
}

I am trying to analyse the data to find customers who make at least X purchases in the last Y days and do so at at least Z different shops. The results I want would give me results that had these important values:-

[{
  "customerid": 12345,
  "distinct_shops": 8,
  "total_sales": 54
},
{
  "customerid": 45678,
  "distinct_shops": 5,
  "total_sales": 101
},
{
  "customerid": 35790,
  "distinct_shops": 20,
  "total_sales": 49
}, ...]

Each of the groups of three values in the above result set would represent a different unique customer. There are millions of sales in the index and many different customers. The results should only list the customers that have been to at least the specified number of shops and made at least the specified number of purchases. And that activity has to have been done in a specified number of days.

All three of those specified parameters to the query need to be variable by the user at run time, so I don't think I can calculate the results in advance.

The counts of the distinct shops visited needs to be accurate, so the cardinality keyword in Elasticsearch doesn't help me.

I would know how to write this very simply in SQL:-

SELECT customerid, count(customerid), COUNT(DISTINCT shop) FROM sales GROUP BY customerid, shop HAVING count(customerid)>=20

But I can't see how to do it in query DSL.

Any suggestions?

Thanks

Adam

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.