Boost, scoring and phrase search

Yannick_Smits_2 · June 20, 2011, 9:15pm

     Can we have query time index boost levels?

     How would you implement formatting based boosting. For instance

have bold, italic and uppercase texts (HTML) have higher scores?

     I noticed phrase searches actually don't limit the search results

in to matches in 1 phrase. So when searching for "fear of dark" it will also
match if fear and dark appear in different sentences ("Then it became dark.
He had fear of cats."). Is there a way to force the result to be in the same
sentence?

     Is proximity of words also taken into account when doing a

multi-word phrase search (the closer words appear the higher the scoring?)

Thanks,
Yannick

Yannick_Smits_2 · June 22, 2011, 6:50am

Could somebody help me with my Monday questions? Even if you only know the
answer to 1 of them it would be very helpful.

Thanks,
Yannick

From: Yannick Smits [mailto:mailinglists@goyaweb.nl]
Sent: maandag 20 juni 2011 23:15
To: users@elasticsearch.com
Subject: boost, scoring and phrase search

     Can we have query time index boost levels?

     How would you implement formatting based boosting. For instance

have bold, italic and uppercase texts (HTML) have higher scores?

     I noticed phrase searches actually don't limit the search results

in to matches in 1 phrase. So when searching for "fear of dark" it will also
match if fear and dark appear in different sentences ("Then it became dark.
He had fear of cats."). Is there a way to force the result to be in the same
sentence?

     Is proximity of words also taken into account when doing a

multi-word phrase search (the closer words appear the higher the scoring?)

Thanks,
Yannick

Clinton_Gormley · June 22, 2011, 11:40am

Hi Yannick

On Mon, 2011-06-20 at 23:15 +0200, Yannick Smits wrote:

     Can we have query time index boost levels?

Yes. Look at the Query DSL docs - most queries take a 'boost'
parameter.

     How would you implement formatting based boosting. For
instance have bold, italic and uppercase texts (HTML) have higher
scores?

I don't think you can (at least not currently)

     I noticed phrase searches actually donât limit the search
results in to matches in 1 phrase. So when searching for âfear of
darkâ it will also match if fear and dark appear in different
sentences (âThen it became dark. He had fear of cats.â). Is there a
way to force the result to be in the same sentence?

I think you mean a 'text' query, not a 'text_phrase' query. A text
query will match text that contains the same words. A text_phrase query
will find text that includes exactly the same phrase (ignoring stop
words like 'of').

You can set the slop factor for text_phrase queries so that the words
don't have to be right next to each other.

     Is proximity of words also taken into account when doing a
multi-word phrase search (the closer words appear the higher the
scoring?)

With the text_phrase query, and slop, yes: proximity is taken into
account.

clint

Yannick_Smits_2 · June 22, 2011, 1:57pm

Hi Clinton,

I had a look at the docs but could not find a way to specify a boost value based on indices the documents are in, at query time, only as a configuration/static (Elasticsearch Platform — Find real-time answers at scale | Elastic). What am I missing?
could you think of a strategy to simulate such a behavior? Like extracting the bold words and saving them with a higher boost to a different field or something without screwing up the highlighting feature?
yes, I'm using text_phrase. But still I would like to know if it is possible to have it look only within the phrase for the specified terms instead of matching over multiple phrases.

Thanks,
Yannick

-----Original Message-----
From: Clinton Gormley [mailto:clinton@iannounce.co.uk]
Sent: woensdag 22 juni 2011 13:41
To: users@elasticsearch.com
Subject: Re: boost, scoring and phrase search

Hi Yannick

On Mon, 2011-06-20 at 23:15 +0200, Yannick Smits wrote:

     Can we have query time index boost levels?

Yes. Look at the Query DSL docs - most queries take a 'boost'
parameter.

     How would you implement formatting based boosting. For
instance have bold, italic and uppercase texts (HTML) have higher
scores?

I don't think you can (at least not currently)

     I noticed phrase searches actually don’t limit the search
results in to matches in 1 phrase. So when searching for “fear of
dark” it will also match if fear and dark appear in different
sentences (“Then it became dark. He had fear of cats.”). Is there a
way to force the result to be in the same sentence?

I think you mean a 'text' query, not a 'text_phrase' query. A text query will match text that contains the same words. A text_phrase query will find text that includes exactly the same phrase (ignoring stop words like 'of').

You can set the slop factor for text_phrase queries so that the words don't have to be right next to each other.

     Is proximity of words also taken into account when doing a
multi-word phrase search (the closer words appear the higher the
scoring?)

With the text_phrase query, and slop, yes: proximity is taken into account.

clint

Clinton_Gormley · June 22, 2011, 2:23pm

Hi Yannick

I had a look at the docs but could not find a way to specify a
boost value based on indices the documents are in, at query time, only
as a configuration/static
(Elasticsearch Platform — Find real-time answers at scale | Elastic). What am I missing?

The page you link to above does not say that indices_boost is static -
it is a parameter that you can pass to any search query.

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"match_all" : {}
},
"indices_boost" : {
"index_foo" : 10,
"index_bar" : 1
}
}
'

could you think of a strategy to simulate such a behavior? Like
extracting the bold words and saving them with a higher boost to a
different field or something without screwing up the highlighting
feature?

This would require some work on the client side. You could have two
fields: 'text' and 'important_text'. 'text' would contain all of the
text, and 'important_text' just the bits inside the tags.

Then you can use a bool query to boost anything found in important_text,
but only do the highlighting on the 'text' field.

yes, I'm using text_phrase. But still I would like to know if it is
possible to have it look only within the phrase for the specified
terms instead of matching over multiple phrases.

ES has no concept of sentences. I thought of possibly breaking up the
content into individual sentences, eg:

[ 'The quick brown fox', 'jumped over the lazy dog']

but it looks like ES just concatenates these values anyway:

[Wed Jun 22 16:22:35 2011] Protocol: http, Server: 192.168.5.103:9200

curl -XPOST 'http://127.0.0.1:9200/foo/bar?pretty=1' -d '
{
"text" : [
"The quick brown fox",
"jumped over the lazy dog"
]
}
'

[Wed Jun 22 16:22:35 2011] Response:

{

"ok" : true,

"_index" : "foo",

"_id" : "-Dt7zDUCQKauV69L_32w9g",

"_type" : "bar",

"_version" : 1

}

[Wed Jun 22 16:22:49 2011] Protocol: http, Server: 192.168.5.103:9200

curl -XGET 'http://127.0.0.1:9200/foo/_search?pretty=1' -d '
{
"query" : {
"text_phrase" : {
"text" : "brown jumped"
}
}
}
'

[Wed Jun 22 16:22:49 2011] Response:

{

"hits" : {

"hits" : ,

"max_score" : null,

"total" : 0

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 2

}

[Wed Jun 22 16:22:54 2011] Protocol: http, Server: 192.168.5.103:9200

curl -XGET 'http://127.0.0.1:9200/foo/_search?pretty=1' -d '
{
"query" : {
"text_phrase" : {
"text" : "fox jumped"
}
}
}
'

[Wed Jun 22 16:22:54 2011] Response:

{

"hits" : {

"hits" : [

{

"_source" : {

"text" : [

"The quick brown fox",

"jumped over the lazy dog"

]

},

"_score" : 0.23013961,

"_index" : "foo",

"_id" : "-Dt7zDUCQKauV69L_32w9g",

"_type" : "bar"

}

],

"max_score" : 0.23013961,

"total" : 1

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 2

}

Topic		Replies	Views
Boost results that starts with exact query Elasticsearch	3	169	March 11, 2024
Question about boost and scoring Elasticsearch	2	441	July 6, 2017
Boost score for prefix matches Elasticsearch	1	288	August 9, 2022
Query String Query boosting Elasticsearch	3	499	November 9, 2018
How to boost scoring for whole word hits over substring hits Elasticsearch	1	167	October 27, 2023

Boost, scoring and phrase search

[Wed Jun 22 16:22:35 2011] Protocol: http, Server: 192.168.5.103:9200

[Wed Jun 22 16:22:35 2011] Response:

{

"ok" : true,

"_index" : "foo",

"_id" : "-Dt7zDUCQKauV69L_32w9g",

"_type" : "bar",

"_version" : 1

}

[Wed Jun 22 16:22:49 2011] Protocol: http, Server: 192.168.5.103:9200

[Wed Jun 22 16:22:49 2011] Response:

{

"hits" : {

"hits" : ,

"max_score" : null,

"total" : 0

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 2

}

[Wed Jun 22 16:22:54 2011] Protocol: http, Server: 192.168.5.103:9200

[Wed Jun 22 16:22:54 2011] Response:

{

"hits" : {

"hits" : [

{

"_source" : {

"text" : [

"The quick brown fox",

"jumped over the lazy dog"

]

},

"_score" : 0.23013961,

"_index" : "foo",

"_id" : "-Dt7zDUCQKauV69L_32w9g",

"_type" : "bar"

}

],

"max_score" : 0.23013961,

"total" : 1

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 2

}

Related Topics