Impossible to implement real custom boost query when the weight is in the child document?


(Csaba Dezsényi) #1

Hello Everyone,

I would like to implement a popularity-based boost in my elasticsearch
engine. I calculate custom popularity boost factors for documents
periodically, but I store these float numbers in a child document, because
I want to avoid the full reindex of the main article documents.

The mapping of the child document is the following:

{

"document_boost": {
"_parent": {
"type": "document"
},
"popular_boost_total": {
"type": "float"
},
"popular_boost_recent": {
"type": "float"
},
"last_updated": {
"type": "date"
}
}
}

I would like to create query that:

  • executes the main query provided by the end users
  • attach the child document (1-1 relation to the parent)
  • boost the score of the main query by multiplying with the custom boost
    factors that are read from the child document (popular_boost_total,
    popular_boost_recent)

I have been struggling with this for a while, and could not find the real
nice solution. The best solution that I could find is the following
(simplified):

GET index/document/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "basketball"
}
}
],
"should": [
{
"has_child": {
"type": "document_boost",
"query": {
"function_score": {
"script_score": {
"script":
"doc['document_boost.popular_boost_total'].value"
}
}
}
}
}
]
}
}
}

However, this is not a real boost, because the second bool part is an
additional score, not a multiplication on the primary query score! In this
case, the amount of boost cannot be expressed as a clean percentage, but a
noisy additional score and the real boosting factor is depends on the
absolute score value of the particular query. So, I think it is wrong.
I would be able to solve it, if the custom boost factors would not be in
chid documents, but in the parent document fields:

GET index/document/_search
{
"query": {
"function_score": {
"query": {
"match": {
"title": "basketball"
}
},
"script_score": {
"script": "doc['popular_boost_recent'].value"
}
}
}
}

Well, it i obvious, it the above case we do not need the has_child query.
I also tried without the bool query:

GET index/document/_search
{
"query": {
"function_score": {
"query": {
"match": {
"title": "basketball"
}
},
"functions": [
{
"filter" : {
"has_child": {
"type": "document_boost",
"query": {"match_all": {}}
}
},
"script_score": {
"script": "doc['document_boost.popular_boost_recent'].value"
}
}
]
}
}
}

In the above case, the script reads the value from the parent document, not
from the child! Well, anyway, it seems a bug, since I explicitly define the
full qualified name.

I think - considering the possibilities of the query API syntax - the last
query above would be the solution for the real multiplication boosting, but
it simpli does not work.
Another solution can be if I would be able to define the score mode for the
bool query, i.e. to tell elastic search not to add, but multiply the scores
of the parts.

Are there others who are facing with the same issue? I think it is a common
request nowadays to have some kind of popularity and other kind of custom
boosts.
Can somebody give me a hint? I hope I just misunderstood something...

Thanks!

Regards,
Csaba

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af4a19e4-1b1c-4702-a016-c88a6c76d04b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Csaba Dezsényi) #2

I could find only one related post:
https://groups.google.com/forum/#!msg/elasticsearch/EGCeJZbhVtA/i32ROGVmFswJ
But this has different question...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6152742a-4d32-47a4-890d-49cd6a4dd291%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #3

Did you change the boost_mode of your function score script? The default
should be "multiply", which is the behavior you want, not "sum", which is
what you are experiencing.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

I have never used it with nested documents, so perhaps it is a bug (or a
feature :slight_smile: )

--
Ivan

On Fri, Jun 6, 2014 at 3:55 AM, Csaba Dezsényi csaba.dezsenyi@gmail.com
wrote:

I could find only one related post:

https://groups.google.com/forum/#!msg/elasticsearch/EGCeJZbhVtA/i32ROGVmFswJ
But this has different question...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6152742a-4d32-47a4-890d-49cd6a4dd291%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6152742a-4d32-47a4-890d-49cd6a4dd291%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBdTboEEGNeDSZZw%2B%2BwDhpeS6xA-1tZ-QrVsuFt-XzUSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Csaba Dezsényi) #4

Thanks Ivan for the tip, but I think the boost_mode is just fine in my
queries. The problem is that I only can access the field of the child
document, if I have an additional bool part query with the has_child query
inside. This causes the sum. The custom score is multiplied with the
has_child query score that is correct.

I also think that this is a bug..

Thanks,
Csaba

  1. június 6., péntek 18:52:39 UTC+2 időpontban Ivan Brusic a következőt
    írta:

Did you change the boost_mode of your function score script? The default
should be "multiply", which is the behavior you want, not "sum", which is
what you are experiencing.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

I have never used it with nested documents, so perhaps it is a bug (or a
feature :slight_smile: )

--
Ivan

On Fri, Jun 6, 2014 at 3:55 AM, Csaba Dezsényi <csaba.d...@gmail.com
<javascript:>> wrote:

I could find only one related post:

https://groups.google.com/forum/#!msg/elasticsearch/EGCeJZbhVtA/i32ROGVmFswJ
But this has different question...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6152742a-4d32-47a4-890d-49cd6a4dd291%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6152742a-4d32-47a4-890d-49cd6a4dd291%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5b5354c-9849-4e7b-a171-33fd63b907cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(cdebry) #5

I have the exact same issue except that I need to boost a child query based on a value in the parent. Sadly, I went through the same exercise and came to the same conclusions.

I agree that the last query is the correct approach. At first, I assumed that the "has_child" filter was out of scope with the function; however, it recognized the field name without throwing an error. The issue is that it's not returning the field value, so it defaults to 1 and effectively doesn't have any impact on the score.

This definitely seems to be a bug. Have you logged it here?


(cdebry) #6

I found a workaround using rescore. It's not ideal, but with a large enough window, it should yield good results. Here's your query again, rewritten with a rescore.

GET index/document/_search
{
"query": {
"match": {
"title": "basketball"
}
},
"rescore": {
"window_size": 100,
"query": {
"score_mode": "multiply",
"rescore_query": {
"has_child": {
"type": "document_boost",
"query": {
"function_score": {
"script_score": {
"script": "doc['document_boost.popular_boost_recent'].value"
}
}
}
}
}
}
}
}


(system) #7