How to increase score for exact word/phrase match in elastic search?

I have a index with movie franchises in them, and I would like the exact word/phrase matches to score higher. For example, if I search for "Star Trek" I want "Star Trek" to score highest (first result) followed by "Star Trek Beyond" and "Star Trek Into Darkness" . Currently when I search for "Star Trek" I get the titles with additional words scoring higher. Is this possible and how?

Also is it possible to get the same results as described above if there is some additrional unmatched text around the search term, for example: "(randomText) Star Trek (randomText)"

Hi,

To perform your search you need to index your data correctly and use multi-fields.
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
For your title field you will have something like:

{ 
  "mappings": { 
     "properties": { 
       "title": { 
          "type": "text", 
          "fields": { 
             "raw": { "type": "keyword" },
             "space": { "type": "text", "analyzer": "whitespace" }
....

then you can do a query string on title.* field
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_multi_field_2

with a boost title.space^n
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boosting

put the query string in a bool should and add a term query with a bigger boost.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

This way your document will hit higher for exact match "Star Trek" will be before "Star Trek Into Darkness" and all the document with "Anything Star Trek Something else" will hit with Star Trek. And will be before Star Wars as Star Trek will hit 2 tokens.
You can also add description in the query string if it help to get better weight and reduce the noise.

Here is my current settings/mapping:

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title_english": {
        "type": "text",
        "fields": {
          "raw": { "type": "keyword" },
          "space": { "type": "text", "analyzer": "whitespace" }
        },
        "analyzer": "autocomplete"
      },
      "title_native": {
        "type": "text",
        "fields": {
          "raw": { "type": "keyword" },
          "space": { "type": "text", "analyzer": "whitespace" }
        },
        "analyzer": "autocomplete"
      },
      "title_romaji": {
        "type": "text",
        "fields": {
          "raw": { "type": "keyword" },
          "space": { "type": "text", "analyzer": "whitespace" }
        },
        "analyzer": "autocomplete"
      },
      "title_synonyms": {
        "type": "text",
        "fields": {
          "raw": { "type": "keyword" },
          "space": { "type": "text", "analyzer": "whitespace" }
        },
        "analyzer": "autocomplete"
      }
    }
  }
}

And here is my query. I am not getting the desired result.

		'query': {
			'bool': {
				'must': {
					'multi_match': {
						'query': request.args.get('query'),
						'analyzer': 'standard',
						'fields': ['title_*']
					},
				},
				'should': [{
					'term': {
						'title_*.keyword': {
							'value': request.args.get('query'),
							'boost': 3
						}
					}
				},
				{
					'prefix': {
						'title_*.keyword': {
							'value': request.args.get('query'),
							'boost': 2
						}
					}
				}]
			}
		}

I'm sorry but I'm little bit confuse with your mapping!!

Can you provide one or two documents that can help to understand better your needs?

Just about title_native for information there's a lot of analyzer for a wide range of language and in some language Japanese, Chinese whitespace don't work.
Here a list of supported language.
https://www.elastic.co/guide/en/elasticsearch/reference/7.1/analysis-lang-analyzer.html

Thanks for your quick reply. The native field is not important. I probably will remove it. The title_english, title_romaji (latinized version of the japanese name) and title_synonyms (a list/array) are all in latin. Here is the output for the search query "naruto". There is a doc with the exact title "Naruto" but it did not list it in the top 10 results.

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 26,
      "relation" : "eq"
    },
    "max_score" : 10.998999,
    "hits" : [
      {
        "_index" : "anime",
        "_type" : "_doc",
        "_id" : "2266",
        "_score" : 10.998999,
        "_source" : {
          "title_romaji" : "Naruto: Shippuuden Movie 1",
          "title_english" : "Naruto Shippuden the Movie",
          "title_native" : "ナルト 疾風伝",
          "title_synonyms" : [
            "Gekijouban Naruto Shippuuden",
            "Naruto Movie 4",
            "Naruto Shippuuden Movie"
          ]
        }
      },
      {
        "_index" : "anime",
        "_type" : "_doc",
        "_id" : "3638",
        "_score" : 10.919544,
        "_source" : {
          "title_romaji" : "Naruto: Shippuuden Movie 2 - Kizuna",
          "title_english" : "Naruto: Shippuden the Movie 2 -Bonds-",
          "title_native" : "劇場版NARUTO-ナルト- 疾風伝 絆",
          "title_synonyms" : [
            "Naruto Movie 5",
            "Naruto Shippuuden Movie 2",
            "Naruto Shippuuden: Bonds"
          ]
        }
      },
      {
        "_index" : "anime",
        "_type" : "_doc",
        "_id" : "4552",
        "_score" : 10.846692,
        "_source" : {
          "title_romaji" : "Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono",
          "title_english" : "Naruto Shippuden the Movie: The Will of Fire",
          "title_native" : "ナルト- 疾風伝 火の意志を継ぐ者",
          "title_synonyms" : [
            "Naruto Movie 6",
            "Naruto Shippuuden 3: Inheritors of Will of Fire",
            "Naruto Shippuuden Movie 3",
            "Naruto Shippuuden: Gekijouban Naruto Shippuuden: Hi no Ishi o Tsugu Mono"
          ]
        }
      },
      {
        "_index" : "anime",
        "_type" : "_doc",
        "_id" : "1573",
        "_score" : 10.776477,
        "_source" : {
          "title_romaji" : "Naruto: Shippuuden",
          "title_english" : "Naruto: Shippuden",
          "title_native" : "ナルト- 疾風伝",
          "title_synonyms" : [
            "Naruto Shippuden",
            "Naruto Shippuuden"
          ]
        }
      }
   }
}

is the native field only for Japanese if so check about this analyzer:

it will be better if you want your users search in Japanese.

Do you really need autocomplete for all your fields?

Which field is important for search ? it's better to set the mapping and analyzer only on this fields and if the other fields are only for display you can set the mapping as keyword.

"title_english": {
        "type": "text",
        "fields": {
          "raw": { "type": "keyword" },
          "space": { "type": "text", "analyzer": "whitespace" }
        },
        "analyzer": "autocomplete"
 },
"title_native": {
        "type": "keyword"
  },
"title_romaji": {
        "type": "keyword"
  },
"title_synonyms":{
   "type": "text",
   "analyzer": "whitespace"
}

{
"query": {
"bool":{"should":[{
"query_string": {
"fields": [
"title_english^2",
" title_synonyms"
],
"query": "naruto"
}
},{
"term": {
"title_english.raw": {
"value": "Naruto",
"boost": 3
}
}
]}}
}

Does this one work, sorry there's certainly some syntax error I didn't test, but you can get the idea.

Edit:

You'll search in the english and synonym fields but given heavy weight for the english fields, also search in the raw fields to put the biggest boost to exact match this way if you search "Naruto" the first will be Naruto.

Also I forget but it's better to add a lowercase filter this way you always search in lowercase otherwise if you search naruto and your title is Naruto it will not hit.

Just one last comment the first time you will never cover all the case.
I always log the search, i.e I create one monthly index where I save the keyword and the number of result ({'hits': {'total': field) that I save after running the search this way I can figure out which keyword are the most popular and also which keyword don't return any result so I can tune my search. But it take time and you will certainly add more fields and reindex your data several time before having a good search.

Just for reference I never try it but it may be helpful.

It seems to work, but it breaks and I get errors when I use non-letter characters, such as colons or slashes or brackets etc.

Also it is important it searches title_romaji (primary), title_english and title_synonyms as they could all be different and the user may search for a name in one of those fields.

In my original code I added the 'analyzer': 'standard' because I am using edge_ngrams so it needs to use a different analyzer in the input.

I'm still very new to ES, just started using it yesterday, and the documentation is at times kind of confusing. Is it possible to apply the standard analyzer somewhere in the bool/should?

EDIT:
Also I just noticed when I use title_romaji instead of title_english, it does not rank "Naruto" on top even through both field values are the same.

This search query works best for me, but the only problem is that it wont rank exact word/phrase matches on top. Is it possible to do a multi_match and add a boost to exact keyword matches?

		'query': {
			'multi_match': {
				'query': request.args.get('query'),
				'analyzer': 'standard',
				'fields': ['title_*']
			}
		},

you need to add a term search like this:

'query': {'bool':{'should':[{
			'multi_match': {
				'query': request.args.get('query'),
				'analyzer': 'standard',
				'fields': ['title_*']
			}
		},
{
"term": {
"title_english.raw": {
"value": "Naruto",
"boost": 3
}
}]

instead of 'fields': ['title_*']
you'd better to list the fields and add a boost to prioritize.
'fields': ["title_english^2", <--- set a boost to make this one rank better but lower than the term boost
" title_synonyms", "title_romaji"]

it's better to play with analyzer and understand correctly before. You can check other topics about search it may help to understand the basic things and give you an idea on how to implement.

OK looks like I got it working now. "Naruto" ranks on top when I just search for "naruto" and when I use non-letters it does not break. Everything seems to work the way I want it to. This is my query:

	"query": {
		"bool": {
			"should": [
				{
					'multi_match': {
						'query': request.args.get('query'),
						'analyzer': 'standard',
						'fields': ['title_romaji', 'title_english^2', 'title_synonyms']
					},
				},{
					"term": {
						"title_*.raw": {
							"value": request.args.get('query'),
							"boost": 3
						}
					}
				}
			]
		},
	}

But there is something I do not understand. Both title_english and title_romaji are the same (Naruto) but when I give title_romaji a boost it does not work, but when I give it to title_english it works the way it should. Any idea why this is happening?

For now I will take a break from this and watch some tutorials look up examples over the next few days to learn more. Thanks for your help and time! If you have any more helpful resources please link me. Thanks!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.