Adding custom ranking statistics

Good day everyone!

I'm a bit new to ES and DSL query lang is a bit terra incognita for me, but I've already over-googled this topic and haven't found any idea. Idea is simple:

  1. We have search index, did some search query "foo", found some documents (ids) with scores: [{ "_id": "doc_id_1", "_score": 42 }, ...etc... ]
  2. Additionally we have big bad index with ranking statistics per search query. It can be organised in tuples like ("foo", [ "doc_id_1": 1.0, "doc_id_2": 2.0 ]) in some index or whatever else.
  3. We need to add (or multiply, or whatever) value for every found document id for given query (if they are presented ofc) to get final ranking score.

I've tried different approaches, terms queries, but I couldn't reach this level of granularity (per every doc id and boosting value for given search string).

Also, stats per query can be arbitrary large and I can't simple POST endless boosting instructions for every doc_id. There's no joins in ES and ranking function can't be represented in analytical form, only in table-based.

As you've noted, there are no joins in ES and a join is pretty much what you need.

So there are two options I can see.

First, denormalise: in other words, store all of the searches and scores in each document, so that you can look up the ranking for a particular query in each doc and incorporate it into the score. I can imagine that this approach would soon get out of hand though, depending on how many ranked searches you have.

The second option is to build a map of { doc_1: 1.0, doc_2: 2.0,...} for the current query (ie load it from your ranking index) and pass that in as a parameter to a script which checks for the presence of the current doc id in the map, and if found, applies the score change.

This is probably something you want to do in a rescorer (ie after the main query has already executed).

One possible performance optimisation would be to load all scores into memory in a custom rescorer plugin, which then is applied to every query. This just skips the (possibly slow) step of loading, making a map, passing to a script.

Hello and thanks for your reply!

  1. Denormalisation is not my case unfortunately cause I'd like score per search query and doc_id so I need to add map of tuples [ search_1 : score_1 ] per each doc which will take lots of space (as I think)

  2. I thought of it, for each query we collect nearly 1K-100K scores. Not sure it's good to build such queries but I will try to.

About rescorers - I found that they apply to TOPN results of query (window), I don't think that this is good for me, cause our scores can raise docs from very depths of results. But if I'm wrong then rescorer can be the answer.

Also I've found LookUp script plugin that does search and then I can access another index fields from my query (not sure I understand it right): https://github.com/imotov/elasticsearch-native-script-example/blob/7a30538f955f0d90b01fded701e08b75a5c094c1/src/main/java/org/elasticsearch/examples/nativescript/script/LookupScript.java. As I understand this plugin can output additional scripting field (in my case with that map I need). But I haven't find:

  1. Can this code access document score to modify it or (if it's impossible)
  2. Access resulting map field from another script and calculate resulting score there

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.