NEWBIE: how to use variables inside elastic search bucket script

tmcarrara · September 7, 2024, 6:13am

I have a very simple data structure called View that is created every time a user watches a video on my page, churnTime contains the time the user left the video before it ended, if the user watched it until the end the churnTime is equal to the total time of the video, I need to do a retention calculation which basically is to calculate the loss of audience over time. The graph would have the value in % on the y-axis and on the x-axis would have the total time of the video (in seconds), so we know that at time 0 (seconds) the retention is 100% and decays throughout the video forming graphs one. My question is how to run a script in each bucket using variables to calculate the retention value in percentage in each bucket.

View = {
 createdAt: number;
 churnTime: number;
 playerId: string;
 videoDuration: number;
 buttonClicked: boolean;
}

I tried this approach, but i dont know how to use the count results and access de doc_count of each bucket to calculate the retention:

{
"query": {
    "bool": {
        "must": [
            {
               "match": {
                    "playerId": "player_YHgs542d"
                }
            },
            {
                "range": {
                    "createdAt": {
                        "gte": 1714970337821,
                        "lte": 1725597537821
                    }
                }
            }
       ]
    }
},
"aggs": {
    "churntime_histogram": {
        "histogram": {
            "field": "churnTime",
            "interval": 1,
            "min_doc_count": 1
        },
            "aggs": {
// calculate retention in each bucket
// ((number of query results - doc_count of bucket) / number of query results) \* 100
// exemple to be executed in each bucket -\> ((1000(total query results count) - 10(doc_count of bucket)) / 1000) \* 100
            }
        }
    }
}

this is the result of the query:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "churntime_histogram": {
      "buckets": [
        {
          "key": 1,
          "doc_count": 4
        },
        {
          "key": 2,
          "doc_count": 11
        },
        {
          "key": 3,
          "doc_count": 15
        },
        {
          "key": 4,
          "doc_count": 18
        },
        {
          "key": 7,
          "doc_count": 21
        },
        ...

Christian_Dahlqvist · September 7, 2024, 6:39am

I would calculate a churnPercentage field (or similar) and index this per document. As this is known at indexing time I do not see the point in adding complexity and latency by repeatedly recalculate it for every document each query if this is something that is freqiently queried and analysed.

tmcarrara · September 7, 2024, 6:56am

thanks fou your reply! Could you explain a little better how I could do this at the time of indexing? I didn't know this was possible, if this is the case, with each new view that is added I will have to update the churnPercentage field of all other documents that have the same churnTime.

system · October 5, 2024, 6:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I get the bucket script result unconditionally even when the value is 0? Elasticsearch	1	331	May 16, 2020
Bucket_script location Elasticsearch	3	2292	October 11, 2018
ElasticSearch upsert document with complex script Elasticsearch	1	365	May 8, 2019
Advanced Watch calculate buckets Elasticsearch elastic-stack-alerting	2	414	October 14, 2019
Consult a problem about "bucket_script and sum_bucket " Elasticsearch	2	375	December 13, 2018

NEWBIE: how to use variables inside elastic search bucket script

Related topics