Boost value behaviour

Hi,

Hopefully a simple question, but one I'm finding hard to clear up from the
documentation.

When you set a boost on a query, for example:

{"query": {
"bool": {
"must": [
{"query_string":{"fuzzy_prefix_length":3,"query":"description:SEARCH_TERM"}}
],
"should": [
{"range": { "item.rating": { "gt": 1000, "boost":100 } }},
{"range": { "item.rating": { "lt": 10, "boost":0.01 } }}
]
}},
"timeout": "15s",
"size": 10,
"from": 0}

How, exactly, are those boost values applied to the score in the results? I
had assumed they were multipliers of some sort, so that 0.01 would reduce
the score while 100 would increase it. However, messing about with various
values would suggest they're actually added somewhere along the line, so
0.01 increases the score a tiny amount, 100 a large amount, and something
like -10 is required to have a negative impact on the score.

Indeed I had decided that they must be added somewhere, until I noted in
the documentation that the default boost was 1. Which again made me wonder
if it was a multiplier. It seems a curious default to use if the boost is
added.

So, in the above query, what are those boost values actually doing to the
scores? And for those records that don't match either range (so have a
rating greater than ten but less than 100) what boost will be applied - the
default of 1, or nothing because it's not specified?

Cheers,
Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How, exactly, are those boost values applied to the score in the
results? I had assumed they were multipliers of some sort, so that
0.01 would reduce the score while 100 would increase it. However,
messing about with various values would suggest they're actually added
somewhere along the line, so 0.01 increases the score a tiny amount,
100 a large amount, and something like -10 is required to have a
negative impact on the score.

Indeed I had decided that they must be added somewhere, until I noted
in the documentation that the default boost was 1. Which again made me
wonder if it was a multiplier. It seems a curious default to use if
the boost is added.

So, in the above query, what are those boost values actually doing to
the scores? And for those records that don't match either range (so
have a rating greater than ten but less than 100) what boost will be
applied - the default of 1, or nothing because it's not specified?

It's kinda multiplied, but then also normalized - I'm not sure what the
exact calculation is.

You can set {explain: true} in your query to get details about how the
score was calculated.

Also, if you want to apply boost without normalization, then look at the
custom_boost_factor query:
http://www.elasticsearch.org/guide/reference/query-dsl/custom-boost-factor-query.html

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks! That "explain" : true is pretty handy. I was very frustrated by how
little information I was getting back from the queries.

OK, so the normalisation is basically wiping out my boosting. Is it
possible to use custom_boost_factor in a "should" style query as I have
done? I'm not sure how one would construct the syntax.

On Wednesday, 30 January 2013 10:15:48 UTC, Clinton Gormley wrote:

How, exactly, are those boost values applied to the score in the
results? I had assumed they were multipliers of some sort, so that
0.01 would reduce the score while 100 would increase it. However,
messing about with various values would suggest they're actually added
somewhere along the line, so 0.01 increases the score a tiny amount,
100 a large amount, and something like -10 is required to have a
negative impact on the score.

Indeed I had decided that they must be added somewhere, until I noted
in the documentation that the default boost was 1. Which again made me
wonder if it was a multiplier. It seems a curious default to use if
the boost is added.

So, in the above query, what are those boost values actually doing to
the scores? And for those records that don't match either range (so
have a rating greater than ten but less than 100) what boost will be
applied - the default of 1, or nothing because it's not specified?

It's kinda multiplied, but then also normalized - I'm not sure what the
exact calculation is.

You can set {explain: true} in your query to get details about how the
score was calculated.

Also, if you want to apply boost without normalization, then look at the
custom_boost_factor query:

http://www.elasticsearch.org/guide/reference/query-dsl/custom-boost-factor-query.html

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also, I note that the boost is added, but only after it's been
normalised. So the relevant "explain" seems to get a score, add the
normalised boost, and then multiply the total by a "coord" score (no idea
where that comes from).

On Wednesday, 30 January 2013 10:33:30 UTC, Matt Thrower wrote:

Thanks! That "explain" : true is pretty handy. I was very frustrated by
how little information I was getting back from the queries.

OK, so the normalisation is basically wiping out my boosting. Is it
possible to use custom_boost_factor in a "should" style query as I have
done? I'm not sure how one would construct the syntax.

On Wednesday, 30 January 2013 10:15:48 UTC, Clinton Gormley wrote:

How, exactly, are those boost values applied to the score in the
results? I had assumed they were multipliers of some sort, so that
0.01 would reduce the score while 100 would increase it. However,
messing about with various values would suggest they're actually added
somewhere along the line, so 0.01 increases the score a tiny amount,
100 a large amount, and something like -10 is required to have a
negative impact on the score.

Indeed I had decided that they must be added somewhere, until I noted
in the documentation that the default boost was 1. Which again made me
wonder if it was a multiplier. It seems a curious default to use if
the boost is added.

So, in the above query, what are those boost values actually doing to
the scores? And for those records that don't match either range (so
have a rating greater than ten but less than 100) what boost will be
applied - the default of 1, or nothing because it's not specified?

It's kinda multiplied, but then also normalized - I'm not sure what the
exact calculation is.

You can set {explain: true} in your query to get details about how the
score was calculated.

Also, if you want to apply boost without normalization, then look at the
custom_boost_factor query:

http://www.elasticsearch.org/guide/reference/query-dsl/custom-boost-factor-query.html

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, 2013-01-30 at 02:37 -0800, Matt Thrower wrote:

Also, I note that the boost is added, but only after it's been
normalised. So the relevant "explain" seems to get a score, add the
normalised boost, and then multiply the total by a "coord" score (no
idea where that comes from).

The coord is from a bool query (either used explicitly or auto-generated
by another query). Eg if you have:

{ bool: {
must: [{...},{...},{...}]
}}

ie three clauses, then the coord will be 1/3. Each clause counts for
1/3 of the total score.

You can disable_coord in bool queries
http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brilliant, thanks. That nailed it!

On Wednesday, 30 January 2013 10:46:32 UTC, Clinton Gormley wrote:

On Wed, 2013-01-30 at 02:37 -0800, Matt Thrower wrote:

Also, I note that the boost is added, but only after it's been
normalised. So the relevant "explain" seems to get a score, add the
normalised boost, and then multiply the total by a "coord" score (no
idea where that comes from).

The coord is from a bool query (either used explicitly or auto-generated
by another query). Eg if you have:

{ bool: {
must: [{...},{...},{...}]
}}

ie three clauses, then the coord will be 1/3. Each clause counts for
1/3 of the total score.

You can disable_coord in bool queries
http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.