How to Boost two fields

Hi,

I would like to boost 'title' and 'description' fields on query search, and leave all the other fields (I have many) with boost 1. So I did:

GET index1/_doc/_search
{
  "query": {
    "query_string": {
      "query": "web",
      "fields": [
        "title^4",
        "description^2"
      ]
    }    
  }
}

Suppose 'web' is in another field than 'title' and 'description', when I do a search , these documents are not listed.

Did I do something wrong?

Thanks.

One option would be to use the copy_to mapping option to copy all those other fields to a single field called "everything_else" and just add that to your list of fields in the above example.

This has other benefits - searching across many fields using their own independent indexes often produces counterintuitive results. Lucene likes to pick the bizarrest context for a search term because it likes rarity. Searching for books on "web development" for example would rank the document "author:gary web" first because of the typo for the common name Webb. You can see that in this example:

DELETE test
PUT test
{
  "settings": {
	"number_of_replicas": 0,
	"number_of_shards": 1
  },
  "mappings": {
	"_doc": {
	  "properties": {
		"title": {
		  "type": "text"
		},
		"description": {
		  "type": "text"
		},
		"everything_else": {
		  "type": "text"
		},
		"author": {
		  "type": "text",
		  "copy_to":"everything_else"
		},
		"tags": {
		  "type": "keyword",
		  "copy_to":"everything_else"
		}
	  }
	}
  }
}
POST test/_doc/_bulk
{ "index":{} }
{"title": "web dev", "description":"html,css etc", "author":"gary webb",  "tags":["html","css"]}
{ "index":{} }
{"title": "web apps", "description":"tomcat  etc", "author":"paul smith",  "tags":["tomcat"]}
{ "index":{} }
{"title": "web etiquette", "description":"etiquette", "author":"sue jones",  "tags":["morals"]}
{ "index":{} }
{"title": "Cooking", "description":"Asian cuisine", "author":"gary web",  "tags":["food"]}

GET test/_doc/_search?explain=true
{
  "query": {
	"query_string": {
	  "query": "web"
	}    
  }
}

The solution is to reach for the special cross_fields scoring strategy or perhaps more simply to reindex then search the everything_else copy_to field. This blends the stats from author names, titles etc and avoids some of these odd ranking problems.

1 Like

Thanks for your solution.

But how to boost title and description fields?

When I try:

GET test/_doc/_search?explain=true
{
  "query": {
	"query_string": {
	  "query": "web",
      "fields": [
        "title^4",
        "description^2"
      ]
	  }    
  }
}

I get 3 for total of documents searched. And I would like to have 4 instead.

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.347925,
    "hits" : [....

.
Thanks.

They already are in your example?

Don't forget to add the "everything_else" field with the lower boost:

GET test/_doc/_search
{
  "query": {
	"query_string": {
	  "query": "web",
		"fields": [
		"title^4",
		"description^2",
		"everything_else^1"
	  ]      
	}    
  }
}

I found another solution which gives me the solution:

GET test/_doc/_search?explain=true
{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "web"
        }
      },
      "should": [
        {
          "query_string": {
            "query": "web",
            "fields": [
              "title^4",
              "description^2",
            ]
          } 
        }
      ]
    }
  }
}

It may work with these 4 example docs but the more docs you add the rarer that author:web term becomes. The rarer it becomes the higher it scores and at some point you may see that the typo author:web ranks higher than your attempts to boost titles and descriptions.

Your boosts and IDF (term rarity) are both factors in ranking.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.