[Control Relevance] How to boost doc contains specific keywords?


(Xudong You) #1

Let me explain my question with a real example.

I have two docs:
POST /support/docs/1
{
"title": "product 1",
"description": "...get support for your device, how to videos, troubleshooting. Find out the warranty status...",
}

POST /support/docs/1
{
"title": "product 2",
"description": "... phycial RAM ...",
}

Then I query the docs with term "how to find my RAM"

{
"query": {
"match": {
"description": "how to find my RAM"
}
}
}

The doc # 1 will get higher score since it matches more words in query term (how , find). But I would like doc #2 get higher score, since it contains the most meaningful word (RAM) in query string.

So the general question is, how to boost doc which contains specific words? I can build a keyword list which contains all words I want to boost.

Is is doable?


(Lee Hinman) #2

You can boost certain words by separating out the words into separate should clauses of a bool query and giving them individual boosts, for example:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "field": "description", "query": "how to find my RAM" } }
      ],
      "should": [
        {
          "match": { "field": "description", "query": "RAM", "boost": 3 }
        },
        {
          "match": { "field": "description", "query": "find", "boost": 2 }
        }
      ]
    }
  }
}

Since you mentioned you can build a keyword list, then you can have your regular query in the must clause and boost particular terms with additional match queries in the should clause (with boosts).

Additionally, you could use the function_score query to do this, with the boost_factor function.


(Xudong You) #3

If I understand correctly, you mean I need firstly find the boosting words from the query string, e.g, loop the keyword list and check if any keyword contained in the query string, and then put the matched keyword to should clause. Right?

If so, I would say it works, but not perfect, I have to write some code in client.

Any better approach?


(system) #4