I have indexed documents of the form:
{
"device_id": "abc",
"views": 123,
+ other criteria
}
I can compute:
- The count of unique devices using a
cardinality aggregation
ondevice_id
- The overall number of views using a
sum aggregation
onviews
How can I compute the sum of the views such that a given device can only account for a maximum of n
views?
For example, if my docs are:
{"device_id": "a", "views": 3, ...}
{"device_id": "a", "views": 4, ...}
{"device_id": "a", "views": 1, ...}
{"device_id": "b", "views": 2, ...}
{"device_id": "c", "views": 6, ...}
And my n
is 5
, then the result should be 12
= (5 for a
even though its total is 8 + 2 for b
+ 5 for c
)
My indices contain ~500,000 distinct devices.
The result does not need to be exact and can be approximate within reasonable bounds.
I do not mind using my own script using a combinations of techniques (HLL, Count-Min Sketch, Bloomfilters, Min Hash, ...)