NEWBIE: how to use variables inside elastic search bucket script

I have a very simple data structure called View that is created every time a user watches a video on my page, churnTime contains the time the user left the video before it ended, if the user watched it until the end the churnTime is equal to the total time of the video, I need to do a retention calculation which basically is to calculate the loss of audience over time. The graph would have the value in % on the y-axis and on the x-axis would have the total time of the video (in seconds), so we know that at time 0 (seconds) the retention is 100% and decays throughout the video forming graphs one. My question is how to run a script in each bucket using variables to calculate the retention value in percentage in each bucket.

View = {
 createdAt: number;
 churnTime: number;
 playerId: string;
 videoDuration: number;
 buttonClicked: boolean;
}

I tried this approach, but i dont know how to use the count results and access de doc_count of each bucket to calculate the retention:

{
"query": {
    "bool": {
        "must": [
            {
               "match": {
                    "playerId": "player_YHgs542d"
                }
            },
            {
                "range": {
                    "createdAt": {
                        "gte": 1714970337821,
                        "lte": 1725597537821
                    }
                }
            }
       ]
    }
},
"aggs": {
    "churntime_histogram": {
        "histogram": {
            "field": "churnTime",
            "interval": 1,
            "min_doc_count": 1
        },
            "aggs": {
// calculate retention in each bucket
// ((number of query results - doc_count of bucket) / number of query results) \* 100
// exemple to be executed in each bucket -\> ((1000(total query results count) - 10(doc_count of bucket)) / 1000) \* 100
            }
        }
    }
}

this is the result of the query:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "churntime_histogram": {
      "buckets": [
        {
          "key": 1,
          "doc_count": 4
        },
        {
          "key": 2,
          "doc_count": 11
        },
        {
          "key": 3,
          "doc_count": 15
        },
        {
          "key": 4,
          "doc_count": 18
        },
        {
          "key": 7,
          "doc_count": 21
        },
        ...

I would calculate a churnPercentage field (or similar) and index this per document. As this is known at indexing time I do not see the point in adding complexity and latency by repeatedly recalculate it for every document each query if this is something that is freqiently queried and analysed.

1 Like

thanks fou your reply! Could you explain a little better how I could do this at the time of indexing? I didn't know this was possible, if this is the case, with each new view that is added I will have to update the churnPercentage field of all other documents that have the same churnTime.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.