First, there are no "weird questions". It is an interesting usecase.
This can indeed be done with transforms. For this case you create a transform group_by
customer id and collapse the purchases as list. I described an example in an advent calendar post, unfortunately german language. Let me take it from there, you can start with a scripted metric like this:
"all_purchases": {
"scripted_metric": {
"init_script": "state.docs = []",
"map_script": "state.docs.add(new HashMap(params['_source']))",
"combine_script": "return state.docs",
"reduce_script": "def docs = []; for (s in states) {for (d in s) { docs.add(d);}}return docs"
}
}
This would create a list of all purchases:
"all_purchases" : [
{
"order_id" : 42,
"date" : "2019-12-01T10:00:00Z",
...
},
{
"order_id" : 99,
"date" : "2019-12-02T12:00:00Z",
...
},
To get the top-n
2 possibilities come to my mind: "at runtime" or "as post processing".
Post processing: Sort the list afterwards by order date and cut it at n
. You can do this as part of the reduce script or you write the output of the transform into a pipeline and use a script processor.
At runtime: Instead of a list use a sorted map, a TreeMap and you map order_date
to the order object. For insert you could only add to the tree map if it either not reached n
or the first key - which is the "lowest" key - is lower than the key you are looking at. Afterwards you trim it to n
. Memory-wise this is the most efficient way as this will not keep all orders in memory. Memory consumption might not be a problem for an e-commerce usecase but for IOT it could.
I hope I gave you an idea and would be happy if you share your result.