Frequency capped sum

jean.logeart · February 11, 2016, 6:08pm

I have indexed documents of the form:

{
  "device_id": "abc",
  "views": 123,
  + other criteria
}

I can compute:

The count of unique devices using a cardinality aggregation on device_id
The overall number of views using a sum aggregation on views

How can I compute the sum of the views such that a given device can only account for a maximum of n views?

For example, if my docs are:

{"device_id": "a", "views": 3, ...}
{"device_id": "a", "views": 4, ...}
{"device_id": "a", "views": 1, ...}
{"device_id": "b", "views": 2, ...}
{"device_id": "c", "views": 6, ...}

And my n is 5, then the result should be 12 = (5 for a even though its total is 8 + 2 for b + 5 for c)

My indices contain ~500,000 distinct devices.

The result does not need to be exact and can be approximate within reasonable bounds.

I do not mind using my own script using a combinations of techniques (HLL, Count-Min Sketch, Bloomfilters, Min Hash, ...)

Topic		Replies	Views
Aggregate Query on a Large Number of Documents by filtering out fewer doc counts Elasticsearch	2	467	November 20, 2019
Cardinality Limitation Work Around Elasticsearch	1	129	August 24, 2023
Counting the number of buckets matching certain criteria Elasticsearch	3	8502	July 5, 2017
Can I calculate the cardinality of the _id field? Elasticsearch	3	1057	July 6, 2017
Sum bucket aggregation (high cardinality fields) - optimize query Elasticsearch	1	469	June 11, 2019

Frequency capped sum

Related topics