How would you query Elasticsearch to get ratios?


(Asimov4) #1

My data in Elasticsearch looks like this:

  1. I have documents that track the number of visits on our website for each page:
    {
    _index: "visit_index"
    _type: "visit",
    ip: "192.168.1.1",
    page: "index.html"
    }

  2. I have documents that track when a video on a page is played:
    {
    _index: "video_played_index"
    _type: "video_played",
    ip: "192.168.1.1",
    page: "index.html",
    video_title: "my_video"
    }

I would like to perform a query that will get me top pages in terms of engagements. In other words, I want the top pages by ratio of the numbers of video played divided by the number of visits.

This means that I am more interested in a page that got 10 visits and 9 video plays than a page that got 1000 visits and 500 video plays.

How would you do that using Elasticsearch? Will I need to update my document and index setup?


Type casting in Groovy reduce script of scripted metric aggregation
(Loren Siebert) #2

You can use a scripted metric aggregation. I do something similar for click-thru ratios (clicks/searches) where yours is plays/visits. Something like this maybe:

GET /visitindex,videoplayed_index/visit,videoplayed/_search?search_type=count
{
  "aggs": {
    "ratio": {
      "scripted_metric": {
        "init_script": "_agg['play'] = _agg['visit'] = 0",
        "map_script": "_agg[doc['type'].value] += 1",
        "reduce_script": "plays = visits = 0; for (agg in _aggs) {  plays += agg.play ; visits += agg.visit ;}; return 100 * plays / visits"

      }
    },
    "type": {
      "terms": {
        "field": "_type"
      }
    }
  }
}

(Asimov4) #3

This looks great! Thanks a lot!
How well does the scripted metric aggregation scale to massive indices? Is it still efficient over hundreds of millions of such documents?


(Sarwar Bhuiyan) #4

This might be of interest to you in case you got dynamic scripts disabled:

As for how it performs, check how much data comes back and see if you can add a cacheable filter which improves the query and then maybe the scripts won't be so expensive.


(system) #5