Okay to use negative value for boost on boolean query?

I have a query template that currently returns results exactly as desired.
I've been given a requirement to very slightly downrank results that have
an optional boolean field set to True. The intent is to ensure that we
return everything that matches the query, but if multiple records match,
any with this flag set to true come up later in results. In case it's
relevant, most records will not contain this flag. I came up with the
following (simplified) version of my query which works great:

{
"query": {
"bool": {
"should": [
{
"match": {
"my_field": {
"query": "{{q}}"
}
}
},
{
"bool": {
"must": {
"term": {
"some_flag": true
}
},
"boost": -0.00001
}
}
]
}
}
}

A colleague said that we learned in our Elasticsearch training last year
that we should avoid negative boosts, and I should rewrite the second
clause as follows

{
"bool": {
"must_not": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

I don't recall learning that, and this construction strikes me as less
performant, as it must modify most records instead of just the minority
that will have some_flag=true.

Because we're both relatively new to Elasticsearch we'd very much
appreciate someone with more experience to weigh in. I'm happy to change it
if it's the right thing to do. I'm just not sure I believe it is, and if
so, why.

Thanks in advance.

-joel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Negative boosts are not supported. The challenge in downranking is that
each boost value will contribute to the score and push docs higher, also
when using very small boost values or negative values. This is not what is
expected.

The trick for successful downranking is to reward all docs that do not
match the condition

{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}

which is equivalent to

{
"bool": {
"must": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

given that some_flag exists in all docs.

This clause means: reward all docs that do not match the condition
some_flag=true and push them higher in the result set. In other words,
penalize all docs that match the condition some_flag=true.

Jörg

On Mon, Mar 2, 2015 at 7:03 PM, Joel Potischman <
joel.potischman@beatport.com> wrote:

I have a query template that currently returns results exactly as desired.
I've been given a requirement to very slightly downrank results that have
an optional boolean field set to True. The intent is to ensure that we
return everything that matches the query, but if multiple records match,
any with this flag set to true come up later in results. In case it's
relevant, most records will not contain this flag. I came up with the
following (simplified) version of my query which works great:

{
"query": {
"bool": {
"should": [
{
"match": {
"my_field": {
"query": "{{q}}"
}
}
},
{
"bool": {
"must": {
"term": {
"some_flag": true
}
},
"boost": -0.00001
}
}
]
}
}
}

A colleague said that we learned in our Elasticsearch training last year
that we should avoid negative boosts, and I should rewrite the second
clause as follows

{
"bool": {
"must_not": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

I don't recall learning that, and this construction strikes me as less
performant, as it must modify most records instead of just the minority
that will have some_flag=true.

Because we're both relatively new to Elasticsearch we'd very much
appreciate someone with more experience to weigh in. I'm happy to change it
if it's the right thing to do. I'm just not sure I believe it is, and if
so, why.

Thanks in advance.

-joel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH0ohQM3e1Gh1pJFZKKe%2BvXfq7D63%3DgnS7PzZdhZG_-eQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Jörg, that makes sense.

I've made that change and it works but I'm still struggling to have scoring
behave the way I want. I simplified the query in my original post for
clarity. The actual query, with the new flag, is more like this:

{
"query": {
"bool": {
"should": [
{
"match": {
"display_name.raw": {
"query": "{{q}}",
"type": "phrase"
}
}
},
{
"match": {
"display_name.raw_folded": {
"boost": 5,
"query": "{{q}}",
"type": "phrase"
}
}
},
{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}
]
}
}
}

I have the display_name field indexed two additional ways - "raw" and
"raw_folded", "raw" is an exact phrase match, and "raw_folded" is the same
thing with accents/diacritics stripped, so if a query matches raw_folded,
it will always match raw as well and score higher.

I want to use this new clause on some_flag to only slightly decrease
scoring, but due to normalizing I'm finding it very different to do so
without wildly swinging the scores of other records due to boost
normalization. I'd ideally want the presence of this flag set to true to
reduce score by say 1%. The exact number is not important, I just want to
make sure that when multiple records match, those with this flag set to
true rank slightly lower. Think of it as a tiebreaker flag.

I know about function_score queries but that would be a major risk to
implement now, as a) I believe it would require a substantial rewrite of my
template, and b) I've never used them before, and c) we are going live very
soon. If that's really the right way to do this I'll ticket it for after
launch, but I'm hopeful there's a way to do this that only involves minor
tweaks to our existing templates. Any (additional!) guidance is very much
appreciated!

-joel

On Monday, March 2, 2015 at 1:25:49 PM UTC-5, Jörg Prante wrote:

Negative boosts are not supported. The challenge in downranking is that
each boost value will contribute to the score and push docs higher, also
when using very small boost values or negative values. This is not what is
expected.

The trick for successful downranking is to reward all docs that do not
match the condition

{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}

which is equivalent to

{
"bool": {
"must": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

given that some_flag exists in all docs.

This clause means: reward all docs that do not match the condition
some_flag=true and push them higher in the result set. In other words,
penalize all docs that match the condition some_flag=true.

Jörg

On Mon, Mar 2, 2015 at 7:03 PM, Joel Potischman <joel.po...@beatport.com
<javascript:>> wrote:

I have a query template that currently returns results exactly as
desired. I've been given a requirement to very slightly downrank results
that have an optional boolean field set to True. The intent is to ensure
that we return everything that matches the query, but if multiple records
match, any with this flag set to true come up later in results. In case
it's relevant, most records will not contain this flag. I came up with the
following (simplified) version of my query which works great:

{
"query": {
"bool": {
"should": [
{
"match": {
"my_field": {
"query": "{{q}}"
}
}
},
{
"bool": {
"must": {
"term": {
"some_flag": true
}
},
"boost": -0.00001
}
}
]
}
}
}

A colleague said that we learned in our Elasticsearch training last year
that we should avoid negative boosts, and I should rewrite the second
clause as follows

{
"bool": {
"must_not": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

I don't recall learning that, and this construction strikes me as less
performant, as it must modify most records instead of just the minority
that will have some_flag=true.

Because we're both relatively new to Elasticsearch we'd very much
appreciate someone with more experience to weigh in. I'm happy to change it
if it's the right thing to do. I'm just not sure I believe it is, and if
so, why.

Thanks in advance.

-joel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4367fbba-46b7-4b89-8c31-b8614495a8a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You are right with function score query. This is surely the most powerful
way to manipulate scores the way you like to do.

Jörg

On Tue, Mar 3, 2015 at 2:26 AM, Joel Potischman <
joel.potischman@beatport.com> wrote:

Thanks Jörg, that makes sense.

I've made that change and it works but I'm still struggling to have
scoring behave the way I want. I simplified the query in my original post
for clarity. The actual query, with the new flag, is more like this:

{
"query": {
"bool": {
"should": [
{
"match": {
"display_name.raw": {
"query": "{{q}}",
"type": "phrase"
}
}
},
{
"match": {
"display_name.raw_folded": {
"boost": 5,
"query": "{{q}}",
"type": "phrase"
}
}
},
{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}
]
}
}
}

I have the display_name field indexed two additional ways - "raw" and
"raw_folded", "raw" is an exact phrase match, and "raw_folded" is the same
thing with accents/diacritics stripped, so if a query matches raw_folded,
it will always match raw as well and score higher.

I want to use this new clause on some_flag to only slightly decrease
scoring, but due to normalizing I'm finding it very different to do so
without wildly swinging the scores of other records due to boost
normalization. I'd ideally want the presence of this flag set to true to
reduce score by say 1%. The exact number is not important, I just want to
make sure that when multiple records match, those with this flag set to
true rank slightly lower. Think of it as a tiebreaker flag.

I know about function_score queries but that would be a major risk to
implement now, as a) I believe it would require a substantial rewrite of my
template, and b) I've never used them before, and c) we are going live very
soon. If that's really the right way to do this I'll ticket it for after
launch, but I'm hopeful there's a way to do this that only involves minor
tweaks to our existing templates. Any (additional!) guidance is very much
appreciated!

-joel

On Monday, March 2, 2015 at 1:25:49 PM UTC-5, Jörg Prante wrote:

Negative boosts are not supported. The challenge in downranking is that
each boost value will contribute to the score and push docs higher, also
when using very small boost values or negative values. This is not what is
expected.

The trick for successful downranking is to reward all docs that do not
match the condition

{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}

which is equivalent to

{
"bool": {
"must": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

given that some_flag exists in all docs.

This clause means: reward all docs that do not match the condition
some_flag=true and push them higher in the result set. In other words,
penalize all docs that match the condition some_flag=true.

Jörg

On Mon, Mar 2, 2015 at 7:03 PM, Joel Potischman joel.po...@beatport.com
wrote:

I have a query template that currently returns results exactly as
desired. I've been given a requirement to very slightly downrank results
that have an optional boolean field set to True. The intent is to ensure
that we return everything that matches the query, but if multiple records
match, any with this flag set to true come up later in results. In case
it's relevant, most records will not contain this flag. I came up with the
following (simplified) version of my query which works great:

{
"query": {
"bool": {
"should": [
{
"match": {
"my_field": {
"query": "{{q}}"
}
}
},
{
"bool": {
"must": {
"term": {
"some_flag": true
}
},
"boost": -0.00001
}
}
]
}
}
}

A colleague said that we learned in our Elasticsearch training last year
that we should avoid negative boosts, and I should rewrite the second
clause as follows

{
"bool": {
"must_not": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

I don't recall learning that, and this construction strikes me as less
performant, as it must modify most records instead of just the minority
that will have some_flag=true.

Because we're both relatively new to Elasticsearch we'd very much
appreciate someone with more experience to weigh in. I'm happy to change it
if it's the right thing to do. I'm just not sure I believe it is, and if
so, why.

Thanks in advance.

-joel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4367fbba-46b7-4b89-8c31-b8614495a8a4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4367fbba-46b7-4b89-8c31-b8614495a8a4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGosTiByOGAwZJmDMPzSvzNF5fRr8DEAfxRd5Nz9L6oow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks again. I was hoping for easiest way, but most powerful will have to
do! :slight_smile:

Cheers,

-joel

On Tuesday, March 3, 2015 at 3:59:29 AM UTC-5, Jörg Prante wrote:

You are right with function score query. This is surely the most powerful
way to manipulate scores the way you like to do.

Jörg

On Tue, Mar 3, 2015 at 2:26 AM, Joel Potischman <joel.po...@beatport.com
<javascript:>> wrote:

Thanks Jörg, that makes sense.

I've made that change and it works but I'm still struggling to have
scoring behave the way I want. I simplified the query in my original post
for clarity. The actual query, with the new flag, is more like this:

{
"query": {
"bool": {
"should": [
{
"match": {
"display_name.raw": {
"query": "{{q}}",
"type": "phrase"
}
}
},
{
"match": {
"display_name.raw_folded": {
"boost": 5,
"query": "{{q}}",
"type": "phrase"
}
}
},
{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}
]
}
}
}

I have the display_name field indexed two additional ways - "raw" and
"raw_folded", "raw" is an exact phrase match, and "raw_folded" is the same
thing with accents/diacritics stripped, so if a query matches raw_folded,
it will always match raw as well and score higher.

I want to use this new clause on some_flag to only slightly decrease
scoring, but due to normalizing I'm finding it very different to do so
without wildly swinging the scores of other records due to boost
normalization. I'd ideally want the presence of this flag set to true to
reduce score by say 1%. The exact number is not important, I just want to
make sure that when multiple records match, those with this flag set to
true rank slightly lower. Think of it as a tiebreaker flag.

I know about function_score queries but that would be a major risk to
implement now, as a) I believe it would require a substantial rewrite of my
template, and b) I've never used them before, and c) we are going live very
soon. If that's really the right way to do this I'll ticket it for after
launch, but I'm hopeful there's a way to do this that only involves minor
tweaks to our existing templates. Any (additional!) guidance is very much
appreciated!

-joel

On Monday, March 2, 2015 at 1:25:49 PM UTC-5, Jörg Prante wrote:

Negative boosts are not supported. The challenge in downranking is that
each boost value will contribute to the score and push docs higher, also
when using very small boost values or negative values. This is not what is
expected.

The trick for successful downranking is to reward all docs that do not
match the condition

{
"bool": {
"must_not": {
"term": {
"some_flag": true
}
},
"boost": 0.00001
}
}

which is equivalent to

{
"bool": {
"must": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

given that some_flag exists in all docs.

This clause means: reward all docs that do not match the condition
some_flag=true and push them higher in the result set. In other words,
penalize all docs that match the condition some_flag=true.

Jörg

On Mon, Mar 2, 2015 at 7:03 PM, Joel Potischman <joel.po...@beatport.com

wrote:

I have a query template that currently returns results exactly as
desired. I've been given a requirement to very slightly downrank results
that have an optional boolean field set to True. The intent is to ensure
that we return everything that matches the query, but if multiple records
match, any with this flag set to true come up later in results. In case
it's relevant, most records will not contain this flag. I came up with the
following (simplified) version of my query which works great:

{
"query": {
"bool": {
"should": [
{
"match": {
"my_field": {
"query": "{{q}}"
}
}
},
{
"bool": {
"must": {
"term": {
"some_flag": true
}
},
"boost": -0.00001
}
}
]
}
}
}

A colleague said that we learned in our Elasticsearch training last
year that we should avoid negative boosts, and I should rewrite the second
clause as follows

{
"bool": {
"must_not": {
"term": {
"some_flag": false
}
},
"boost": 0.00001
}
}

I don't recall learning that, and this construction strikes me as less
performant, as it must modify most records instead of just the minority
that will have some_flag=true.

Because we're both relatively new to Elasticsearch we'd very much
appreciate someone with more experience to weigh in. I'm happy to change it
if it's the right thing to do. I'm just not sure I believe it is, and if
so, why.

Thanks in advance.

-joel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2113c100-7283-464e-b989-d58853d24a17%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4367fbba-46b7-4b89-8c31-b8614495a8a4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4367fbba-46b7-4b89-8c31-b8614495a8a4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d44242d7-e26b-490a-888c-ae0421831344%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.