Hi,
I am new to Elasticsearch and I encounter a difficultly to make my search query to work as expected.
I want to search on documents that contain two field:
"review_count" : {
"type" : "long"
},
"stars" : {
"type" : "float"
}
And I would like to return documents based on my search and to prioritize those with the highest stars value but also with an high review_count value. This of course needs a trade off.
I tried to use a "function_score" with "field_value_factor" to boost those values:
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "stars",
"factor": 10,
"missing": 1
}
},
{
"field_value_factor": {
"field": "review_count",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
],
"query": {
"multi_match" : {
"query" : "restaurant",
"fields" : ["name", "categories"]
}
}
}
},
"_source": [
"stars",
"review_count"
]
}
But this query isn't powerful enough as the tradeoff is not right and I don't have a clear idea on how I could proceed. This query returns:
{
"_score" : 7810.9053,
"_source" : {
"review_count" : 3771,
"stars" : 4.0
}
},
{
"_score" : 6566.245,
"_source" : {
"review_count" : 1213,
"stars" : 4.0
}
},
{
"_score" : 6180.8545,
"_source" : {
"review_count" : 1483,
"stars" : 4.5
}
}
In the ideal result, the third result would be the first as it has 0.5 stars more and it still has a lot of review_count (1483).
I also tried something with a "script_score" and with a script using a min-max scaling but it doesn't look right at all:
{
"size": 3,
"query": {
"script_score": {
"query": {
"multi_match" : {
"query" : "restaurant",
"fields" : [ "name", "categories"]
}
},
"script": {
"params": {
"stars_factor": 0.6,
"review_count_factor": 0.4
},
"source": """(((doc['stars'].value + 1) - Math.min(doc['stars'].value, doc['review_count'].value)) / (Math.max(doc['stars'].value, doc['review_count'].value) - Math.min(doc['stars'].value, doc['review_count'].value))) * params.stars_factor + (((doc['review_count'].value + 1) - Math.min(doc['stars'].value, doc['review_count'].value)) / (Math.max(doc['stars'].value, doc['review_count'].value) - Math.min(doc['stars'].value, doc['review_count'].value))) * params.review_count_factor"""
}
}
},
"_source": [
"stars",
"review_count"
]
}
Thank you for your help!