How would you query Elasticsearch to get ratios?

My data in Elasticsearch looks like this:

  1. I have documents that track the number of visits on our website for each page:
    {
    _index: "visit_index"
    _type: "visit",
    ip: "192.168.1.1",
    page: "index.html"
    }

  2. I have documents that track when a video on a page is played:
    {
    _index: "video_played_index"
    _type: "video_played",
    ip: "192.168.1.1",
    page: "index.html",
    video_title: "my_video"
    }

I would like to perform a query that will get me top pages in terms of engagements. In other words, I want the top pages by ratio of the numbers of video played divided by the number of visits.

This means that I am more interested in a page that got 10 visits and 9 video plays than a page that got 1000 visits and 500 video plays.

How would you do that using Elasticsearch? Will I need to update my document and index setup?

You can use a scripted metric aggregation. I do something similar for click-thru ratios (clicks/searches) where yours is plays/visits. Something like this maybe:

GET /visitindex,videoplayed_index/visit,videoplayed/_search?search_type=count
{
  "aggs": {
    "ratio": {
      "scripted_metric": {
        "init_script": "_agg['play'] = _agg['visit'] = 0",
        "map_script": "_agg[doc['type'].value] += 1",
        "reduce_script": "plays = visits = 0; for (agg in _aggs) {  plays += agg.play ; visits += agg.visit ;}; return 100 * plays / visits"

      }
    },
    "type": {
      "terms": {
        "field": "_type"
      }
    }
  }
}
3 Likes

This looks great! Thanks a lot!
How well does the scripted metric aggregation scale to massive indices? Is it still efficient over hundreds of millions of such documents?

This might be of interest to you in case you got dynamic scripts disabled:

https://www.elastic.co/blog/scripting-security

As for how it performs, check how much data comes back and see if you can add a cacheable filter which improves the query and then maybe the scripts won't be so expensive.