I have a very simple data structure called View that is created every time a user watches a video on my page, churnTime contains the time the user left the video before it ended, if the user watched it until the end the churnTime is equal to the total time of the video, I need to do a retention calculation which basically is to calculate the loss of audience over time. The graph would have the value in % on the y-axis and on the x-axis would have the total time of the video (in seconds), so we know that at time 0 (seconds) the retention is 100% and decays throughout the video forming graphs one. My question is how to run a script in each bucket using variables to calculate the retention value in percentage in each bucket.
View = {
createdAt: number;
churnTime: number;
playerId: string;
videoDuration: number;
buttonClicked: boolean;
}
I tried this approach, but i dont know how to use the count results and access de doc_count of each bucket to calculate the retention:
{
"query": {
"bool": {
"must": [
{
"match": {
"playerId": "player_YHgs542d"
}
},
{
"range": {
"createdAt": {
"gte": 1714970337821,
"lte": 1725597537821
}
}
}
]
}
},
"aggs": {
"churntime_histogram": {
"histogram": {
"field": "churnTime",
"interval": 1,
"min_doc_count": 1
},
"aggs": {
// calculate retention in each bucket
// ((number of query results - doc_count of bucket) / number of query results) \* 100
// exemple to be executed in each bucket -\> ((1000(total query results count) - 10(doc_count of bucket)) / 1000) \* 100
}
}
}
}
this is the result of the query:
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"churntime_histogram": {
"buckets": [
{
"key": 1,
"doc_count": 4
},
{
"key": 2,
"doc_count": 11
},
{
"key": 3,
"doc_count": 15
},
{
"key": 4,
"doc_count": 18
},
{
"key": 7,
"doc_count": 21
},
...