Ngram and score in query

weibin.wu · July 20, 2017, 9:50am

Hi Elasticsearch.

When we do a ngram tokenizer, we will get token with start_offset.
{
"token": "vi",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "iv",
"start_offset": 1,
"end_offset": 3,
"type": "word",
"position": 1
}

Can we have like bigger "start_offset" has lower score than smaller "start_offset"?

In this case is when I have two document
{"text" : "vivo"}
{"text" : "ivov"}
All I use ngram (min_gram: 2, max_gram:2) as tokenizer.
When I search "iv", can I expect {"text" : "ivov"} has higher score than {"text" : "vivo"} because "start_offset" is smaller?
For now I see they have the same score in this case.

polyfractal · July 20, 2017, 8:40pm

You're correct in that ngrams are only scored for how well they match, not the position. I don't think there is a way to weight the score based on their offset.

What's the use-case here? You want to weight matches at the start of the word higher than at the end? You could probably accomplish that manually using span queries but it'd be a huge pain. If you can describe the motivation I might be able to help work out an alternative method

weibin.wu · July 31, 2017, 1:28am

Thanks Polyfractal,

I have a use case like this.
Two words: [vivo] [ivid].
Analyzer: ngram: min_gram:2, max_gram:2
search: "match":{"text": "vi"}
vivo doc_id: 1
ivid doc_id: 2
It supposes to give back "vivo" because searching "vi" is more likely as searching "vivo" rather than "ivid".
However, ngram return the same score, and because of doc_id, ivid will be the first document return.

Anyway to solve this?

system · August 28, 2017, 1:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NGRAM Tokens and query_string question Elasticsearch	3	734	May 4, 2017
Limit ngram tokenizer Elasticsearch	1	522	April 28, 2017
Score with ngram filter Elasticsearch	2	343	July 12, 2018
Elasticsearch - how to make shorter phrase more relevant in result Elasticsearch	2	629	September 13, 2019
How to boost higher ngrams? Elasticsearch	1	617	April 24, 2019

Ngram and score in query

Related topics