Question on sub-query scoring

loren · April 24, 2018, 5:08pm

I have query_string queries made up of 4 parts like this:

       (topic2_term1 OR topic2_term2 OR topic2_term3) AND
       (topic3_term1 OR topic3_term2) AND
       rare_term

I'm just querying a short title and longer content text field using the default bm25 model.

Typically topic1's terms are really popular in the corpus, topic2's less so, topic3's even less, and the final query term occurs infrequently in the corpus. The behavior I'm getting is that many of my highest scoring documents end up being about 2 or 3 of the topics, but barely mention the others.

I think what I'm trying to do here is prioritize documents that mention each of these subqueries equally. I don't want a document that is primarily about topic1 and just happens to mention the rare term once somewhere in the content.

Can someone suggest a way to go about this?

Right now I'm breaking sending each subquery into a filters agg to get doc counts for each one.

 "aggregations": {
    "subs": {
      "buckets": {
        "topic1": {
          "doc_count": 13335846
        },
        "rare_term": {
          "doc_count": 225146
        },
        "topic2": {
          "doc_count": 1726988
        },
        "topic3": {
          "doc_count": 35396026
        }
      }
    }
  }

Then I'm using that to try two different approaches:

Use the difference between a given subquery and the most popular subquery as a boost. So if the most popular topic's terms occur 157x more than the rare term, I use rare_term^157 in the original query_string.
Rescore the original query 4 times, starting with the rare_term. The idea here would be to take the top 1000 based on bm25 and then find the top scoring docs among them for the rare_term, and so on.

Is there a third, better way?

Possibly related topics:

system · May 22, 2018, 5:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query and score similar documents based on hierarchical data Elasticsearch	1	330	August 6, 2019
Give more score to documents that contains all query terms Elasticsearch	1	369	June 17, 2018
How to provide more score when the "terms" query has multiple match? Elasticsearch	1	326	February 11, 2019
Combing relevancy and trending scores Elasticsearch	7	706	July 6, 2017
Refactoring a search Elasticsearch	17	562	July 6, 2017

Question on sub-query scoring

Related topics