Sort by document boost


(Hannes Korte) #1

Hi all,

I have an index where every document has a certain document boost based
on some document importance score. With the usual search everything
works fine. But if the user browses the content using only filters
without a query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant
score to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other
field's properties in the mapping. In this case being a float value.

https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost
and one for sorting). But is there a more elegant solution? I would like
to avoid having this duplicated field in my model files. I already tried
using a multi field to no avail.

https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

I submitted a pull request to provide this functionality a while ago, but
it was ignored by the elasticsearch team:

I use the two field solution currently.

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte email@hkorte.com wrote:

Hi all,

I have an index where every document has a certain document boost based on
some document importance score. With the usual search everything works
fine. But if the user browses the content using only filters without a
query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant score
to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other field's
properties in the mapping. In this case being a float value.

https://gist.github.com/**hkorte/6592245https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost
and one for sorting). But is there a more elegant solution? I would like to
avoid having this duplicated field in my model files. I already tried using
a multi field to no avail.

https://gist.github.com/**hkorte/6592285https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #3

Hi,

it is possible to access the _boost field with a script. The query
would look like this:

{
"query": {
"function_score": {
"script_score": {
"script": "_source._boost"
},
"boost_mode":"replace"
}
}
}

Alternatively, you could use the custom_score:

{
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "_source._boost"
}
}
}

Is this what you need?

Cheers,
Britta

---------- Forwarded message ----------
From: Ivan Brusic ivan@brusic.com
Date: Wed, Sep 18, 2013 at 4:26 AM
Subject: Re: Sort by document boost
To: elasticsearch@googlegroups.com

I submitted a pull request to provide this functionality a while ago,
but it was ignored by the elasticsearch team:

I use the two field solution currently.

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte email@hkorte.com wrote:

Hi all,

I have an index where every document has a certain document boost based on some document importance score. With the usual search everything works fine. But if the user browses the content using only filters without a query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant score to each document. Adding a sort desc on the boost field results in undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other field's properties in the mapping. In this case being a float value.

https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost and one for sorting). But is there a more elegant solution? I would like to avoid having this duplicated field in my model files. I already tried using a multi field to no avail.

https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #4

I think all of the solutions here are working. Yet, solution with the
script and the source will likely suffer from loading the source for each
document. The purpose if the _boost field is to assign the boost that is
set to the field to every field in the document to be available to the
lucene scoring. @Ivan, I don't think we ignored your PR it somehow got lost
I guess. I think @mvg is looking into it and we are discussing solutions
internally. Sorry for not responding on it, please feel free to ping on
issues like this once in a while if they don't get attention.

simon

On Wednesday, September 18, 2013 11:25:12 PM UTC+2, Britta Weber wrote:

Hi,

it is possible to access the _boost field with a script. The query
would look like this:

{
"query": {
"function_score": {
"script_score": {
"script": "_source._boost"
},
"boost_mode":"replace"
}
}
}

Alternatively, you could use the custom_score:

{
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "_source._boost"
}
}
}

Is this what you need?

Cheers,
Britta

---------- Forwarded message ----------
From: Ivan Brusic <iv...@brusic.com <javascript:>>
Date: Wed, Sep 18, 2013 at 4:26 AM
Subject: Re: Sort by document boost
To: elasti...@googlegroups.com <javascript:>

I submitted a pull request to provide this functionality a while ago,
but it was ignored by the elasticsearch team:
https://github.com/elasticsearch/elasticsearch/pull/2913

I use the two field solution currently.

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte <em...@hkorte.com<javascript:>>
wrote:

Hi all,

I have an index where every document has a certain document boost based
on some document importance score. With the usual search everything works
fine. But if the user browses the content using only filters without a
query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant
score to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other
field's properties in the mapping. In this case being a float value.

https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost
and one for sorting). But is there a more elegant solution? I would like to
avoid having this duplicated field in my model files. I already tried using
a multi field to no avail.

https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hannes Korte) #5

Hi Ivan,

On 18.09.2013 04:26, Ivan Brusic wrote:

I submitted a pull request to provide this functionality a while ago, but
it was ignored by the elasticsearch team:
https://github.com/elasticsearch/elasticsearch/pull/2913

Interesting. Good to know, that I am not the only one having this
problem.. :slight_smile:

I use the two field solution currently.

Yes, me too. I think this is the best solution so far. To avoid
duplicating the boost field in my model files, I simply added this line
after JSON serialization:

str = str.substring(0, str.length() - 1) + ","_boost":" + boost + "}";

Not beautiful, but simple.. and it works.

Hannes

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte email@hkorte.com wrote:

Hi all,

I have an index where every document has a certain document boost based on
some document importance score. With the usual search everything works
fine. But if the user browses the content using only filters without a
query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant score
to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other field's
properties in the mapping. In this case being a float value.

https://gist.github.com/**hkorte/6592245https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost
and one for sorting). But is there a more elegant solution? I would like to
avoid having this duplicated field in my model files. I already tried using
a multi field to no avail.

https://gist.github.com/**hkorte/6592285https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #6

Hi Hannes,

Once PR 2913 gets in, you can just set the 'index' option to 'not_analyzed'
for the '_boost' field and then you should be able to sort by it.

Martijn

On 19 September 2013 14:08, Hannes Korte email@hkorte.com wrote:

Hi Ivan,

On 18.09.2013 04:26, Ivan Brusic wrote:

I submitted a pull request to provide this functionality a while ago, but
it was ignored by the elasticsearch team:
https://github.com/**elasticsearch/elasticsearch/**pull/2913https://github.com/elasticsearch/elasticsearch/pull/2913

Interesting. Good to know, that I am not the only one having this
problem.. :slight_smile:

I use the two field solution currently.

Yes, me too. I think this is the best solution so far. To avoid
duplicating the boost field in my model files, I simply added this line
after JSON serialization:

str = str.substring(0, str.length() - 1) + ","_boost":" + boost + "}";

Not beautiful, but simple.. and it works.

Hannes

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte email@hkorte.com wrote:

Hi all,

I have an index where every document has a certain document boost based
on
some document importance score. With the usual search everything works
fine. But if the user browses the content using only filters without a
query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant
score
to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other
field's
properties in the mapping. In this case being a float value.

https://gist.github.com/****hkorte/6592245https://gist.github.com/**hkorte/6592245
<https://gist.**github.com/hkorte/6592245https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost
and one for sorting). But is there a more elegant solution? I would like
to
avoid having this duplicated field in my model files. I already tried
using
a multi field to no avail.

https://gist.github.com/****hkorte/6592285https://gist.github.com/**hkorte/6592285
<https://gist.**github.com/hkorte/6592285https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.comhttp://googlegroups.com
<elasticsearch%**2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>
.
For more options, visit https://groups.google.com/****groups/opt_outhttps://groups.google.com/**groups/opt_out
<https://groups.**google.com/groups/opt_outhttps://groups.google.com/groups/opt_out

.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #7

The workaround is simple and only has a minuscule affect on space, so the
pull request was not really important. For me, the change was more
important so that the functionality of boost field would be consistent with
the timestamp field. If the pull request gets accepted, I guess we will
have eventual consistency. :rimshot:

Eventually, I need to move from document boosts to using a custom score
such as the one Britta mentioned. There is a loss of precision when the
boost gets encoded in the field norms and I also need to provided dynamic
boosts based on different fields. Easy to code (already did months ago),
but since there is some performance impact, I need to benchmark it, which I
do not have time for.

My git-fu skills have deteriorated since I have been stuck in Perforce-land
for two years. Sorry about the messy commits, will fix soon. It is ironic
since I consistently complain about not have a VCS with lightweight
branching and here I am messing up branching. :slight_smile:

Cheers,

Ivan

On Thu, Sep 19, 2013 at 8:38 AM, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

Hi Hannes,

Once PR 2913 gets in, you can just set the 'index' option to
'not_analyzed' for the '_boost' field and then you should be able to sort
by it.

Martijn

On 19 September 2013 14:08, Hannes Korte email@hkorte.com wrote:

Hi Ivan,

On 18.09.2013 04:26, Ivan Brusic wrote:

I submitted a pull request to provide this functionality a while ago, but
it was ignored by the elasticsearch team:
https://github.com/**elasticsearch/elasticsearch/**pull/2913https://github.com/elasticsearch/elasticsearch/pull/2913

Interesting. Good to know, that I am not the only one having this
problem.. :slight_smile:

I use the two field solution currently.

Yes, me too. I think this is the best solution so far. To avoid
duplicating the boost field in my model files, I simply added this line
after JSON serialization:

str = str.substring(0, str.length() - 1) + ","_boost":" + boost + "}";

Not beautiful, but simple.. and it works.

Hannes

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte email@hkorte.com wrote:

Hi all,

I have an index where every document has a certain document boost based
on
some document importance score. With the usual search everything works
fine. But if the user browses the content using only filters without a
query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant
score
to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other
field's
properties in the mapping. In this case being a float value.

https://gist.github.com/****hkorte/6592245https://gist.github.com/**hkorte/6592245
<https://gist.**github.com/hkorte/6592245https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a boost
and one for sorting). But is there a more elegant solution? I would
like to
avoid having this duplicated field in my model files. I already tried
using
a multi field to no avail.

https://gist.github.com/****hkorte/6592285https://gist.github.com/**hkorte/6592285
<https://gist.**github.com/hkorte/6592285https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearch+unsubscribe@googlegroups.comhttp://googlegroups.com
<elasticsearch%**2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>
.
For more options, visit https://groups.google.com/****groups/opt_outhttps://groups.google.com/**groups/opt_out
<https://groups.**google.com/groups/opt_outhttps://groups.google.com/groups/opt_out

.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #8

Thanks to Martijn for pushing out the change. I think that means it is time
to finally abandon document boost and switch to custom scoring. :slight_smile:

On Thu, Sep 19, 2013 at 9:41 AM, Ivan Brusic ivan@brusic.com wrote:

The workaround is simple and only has a minuscule affect on space, so the
pull request was not really important. For me, the change was more
important so that the functionality of boost field would be consistent with
the timestamp field. If the pull request gets accepted, I guess we will
have eventual consistency. :rimshot:

Eventually, I need to move from document boosts to using a custom score
such as the one Britta mentioned. There is a loss of precision when the
boost gets encoded in the field norms and I also need to provided dynamic
boosts based on different fields. Easy to code (already did months ago),
but since there is some performance impact, I need to benchmark it, which I
do not have time for.

My git-fu skills have deteriorated since I have been stuck in
Perforce-land for two years. Sorry about the messy commits, will fix soon.
It is ironic since I consistently complain about not have a VCS with
lightweight branching and here I am messing up branching. :slight_smile:

Cheers,

Ivan

On Thu, Sep 19, 2013 at 8:38 AM, Martijn v Groningen <
martijn.v.groningen@gmail.com> wrote:

Hi Hannes,

Once PR 2913 gets in, you can just set the 'index' option to
'not_analyzed' for the '_boost' field and then you should be able to sort
by it.

Martijn

On 19 September 2013 14:08, Hannes Korte email@hkorte.com wrote:

Hi Ivan,

On 18.09.2013 04:26, Ivan Brusic wrote:

I submitted a pull request to provide this functionality a while ago,
but
it was ignored by the elasticsearch team:
https://github.com/**elasticsearch/elasticsearch/**pull/2913https://github.com/elasticsearch/elasticsearch/pull/2913

Interesting. Good to know, that I am not the only one having this
problem.. :slight_smile:

I use the two field solution currently.

Yes, me too. I think this is the best solution so far. To avoid
duplicating the boost field in my model files, I simply added this line
after JSON serialization:

str = str.substring(0, str.length() - 1) + ","_boost":" + boost + "}";

Not beautiful, but simple.. and it works.

Hannes

Cheers,

Ivan

On Tue, Sep 17, 2013 at 2:54 AM, Hannes Korte email@hkorte.com wrote:

Hi all,

I have an index where every document has a certain document boost
based on
some document importance score. With the usual search everything works
fine. But if the user browses the content using only filters without a
query string, I want the result to be sorted by the boost.

The match all query ignores the boost value as it assigns a constant
score
to each document. Adding a sort desc on the boost field results in
undefined ordering with "-Infinity" as sort values.

I noticed that defining a field as a boost field removes the other
field's
properties in the mapping. In this case being a float value.

https://gist.github.com/****hkorte/6592245https://gist.github.com/**hkorte/6592245
<https://gist.**github.com/hkorte/6592245https://gist.github.com/hkorte/6592245

The simple solution is obviously to add this field twice (one as a
boost
and one for sorting). But is there a more elegant solution? I would
like to
avoid having this duplicated field in my model files. I already tried
using
a multi field to no avail.

https://gist.github.com/****hkorte/6592285https://gist.github.com/**hkorte/6592285
<https://gist.**github.com/hkorte/6592285https://gist.github.com/hkorte/6592285

Any ideas?

Best regards,
Hannes

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearch+unsubscribe@googlegroups.comhttp://googlegroups.com
<elasticsearch%**2Bunsubscribe@googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
**>
.
For more options, visit https://groups.google.com/****groups/opt_outhttps://groups.google.com/**groups/opt_out
<https://groups.**google.com/groups/opt_outhttps://groups.google.com/groups/opt_out

.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #9