I have documents that track the number of visits on our website for each page:
{
_index: "visit_index"
_type: "visit",
ip: "192.168.1.1",
page: "index.html"
}
I have documents that track when a video on a page is played:
{
_index: "video_played_index"
_type: "video_played",
ip: "192.168.1.1",
page: "index.html",
video_title: "my_video"
}
I would like to perform a query that will get me top pages in terms of engagements. In other words, I want the top pages by ratio of the numbers of video played divided by the number of visits.
This means that I am more interested in a page that got 10 visits and 9 video plays than a page that got 1000 visits and 500 video plays.
How would you do that using Elasticsearch? Will I need to update my document and index setup?
You can use a scripted metric aggregation. I do something similar for click-thru ratios (clicks/searches) where yours is plays/visits. Something like this maybe:
This looks great! Thanks a lot!
How well does the scripted metric aggregation scale to massive indices? Is it still efficient over hundreds of millions of such documents?
As for how it performs, check how much data comes back and see if you can add a cacheable filter which improves the query and then maybe the scripts won't be so expensive.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.