When using function score query getting round()


(acv2) #1

Hi there,
I'm back with another annoying question

a little bit of context:
elasticsearch version => 1.5
Query-> function_score_query with a condition over field (ex, where $field == null ? 1 : 2)

basically i started my test around that and it seems to be working until i got really deep on it, my findings are the following->

using the following query:

{
"size": 10,
"_source": [
"doc._id",
"doc.title",
"doc.geo_country",
"doc.geo_city",
"doc.ppc"
],
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"query_string": {
"query": "+admin* AND ppc:>0"
}
},
"script_score": {
"script": "(doc['ppc'].value*1) * (doc['geo_city'].value == 'null' ? 1 : 2)"
}
}
}
}

it seems to be working, and we work with that for a while, then I put my hands on that again and start testing deep the behavior of the script with the rest of the query, so OK basically the boost mode by default is that everything resulting inside the script_score is going to multiply the general score exactly what we wanted. but for testing I'll set the boos_mode to replace then use "_score" variable to multiply my ~constant, the same thing

then we start to notice a weird behaviors when you multiplies the score by a constant, ex ->
{
"size": 10,
"_source": [
"doc._id",
"doc.title",
"doc.geo_country",
"doc.geo_city",
"doc.ppc"
],
"sort": [
{
"_score": {
"order": "asc"
}
}
],
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"query_string": {
"query": "+php"
}
},
"script_score": {
"script": "_score * 5"
}
}
}
}

then the highest scores results in:
{
"_index": "jobs02",
"_type": "document",
"_id": "enrc9ewffbasdf",
"_score": 10,
"_source": {
"doc": {
"title": "PHP BACKEND ENTWICKLER (m/w) für Hamburg",
"ppc": 0,
"geo_city": "Hamburg",
"geo_country": "DE"
}
}
}

while the lowest were all set to 0, (note that im using replace mode, to avoid redundancy) and that was weird, i change the line "_score*5" with only "_score" then i got

"_index": "jobs02",
"_type": "document",
"_id": "enrc9ewffb",
"_score": 2.013448,
"_source": {
"doc": {
"title": "PHP BACKEND ENTWICKLER (m/w) für Hamburg",
"ppc": 0,
"geo_city": "Hamburg",
"geo_country": "DE"
}
}
}

which is great, but i noticed that the function was only taking "int" values of _score, no matter if the original score was 2.9 it will only take 25, which is a huge difference when you want to improve the quality of your queries, you could have a range of documents from 1.1 to 1.9 having the same score which is really bad, then worst, having those with "low basic _score" but a really huge boost depending on its importance dragged to an score of "0" just because its basic score was 0.99999, and no matter the boost you set, it will be 0boost which is catastrophic.

The light at the end, i've been working with elastic search for a while and i know that almost always there are workarounds, yet i cant find anything on the documentation neither the forums, and is the main reason what im writing this post to help other that could be facing the same problem without even knowing.
Solution: when none of the previous research gave me results i start testing on my side and decided to set the decimals manually on my side to see what happen and guess what?? it WORKED, then i realized that maybe the rounding is intended, im just not seeing the value of it.
in nutshell if you set the decimal range manually it will work perfectly

{
"size": 10,
"_source": [
"doc._id",
"doc.title",
"doc.geo_country",
"doc.geo_city",
"doc.ppc"
],
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"query_string": {
"query": "+php AND ppc:>0"
}
},
"script_score": {
"script": "_score*5.0" //<- just add the decimal it will work flawlessly
}
}
}
}

Hope to have all your doubts and feedback to keep growing in this family =)

Regards
Daniel


(system) #2