Getting the max theoretical score for a search query

Guy_Bowden · April 10, 2014, 3:02pm

Hi there,

is there any way to figure out a maximum theoretical score for a non text
search query - something like:

"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137 OR
193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}

At the moment I'm using Django Haystack to generate the query - so I could
build it differently if that helps.

Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?

Each document attribute I'm querying is just an array of integers.

The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.

i.e.

if the max score is 2, I want to do something with the top 40 docs that are
over a score of 1.5 (if there's only 20 >1.5, then that's fine, just those
20, or if there's 60 >1.5 then only the top 40..)

and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match

Thanks

Guy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · April 10, 2014, 11:57pm

Hi,

Although it is generally not advised to return scores as percentages[1], in
this particular case it would work given that you are only querying
structured attributes and don't care about term frequencies and so on.

For this problem, I would recommend using the function_score query[2],
which allows to decide on how scores should be computed. Since you control
the score computation, you can know the maximum score and this can be
helpful to display the score as a percentage to your users.

[1] ScoresAsPercentages - Apache Lucene (Java) - Apache Software Foundation
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Thu, Apr 10, 2014 at 5:02 PM, Guy Bowden guy@gentianesolutions.comwrote:

Hi there,

is there any way to figure out a maximum theoretical score for a non text
search query - something like:

"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137 OR
193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}

At the moment I'm using Django Haystack to generate the query - so I could
build it differently if that helps.

Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?

Each document attribute I'm querying is just an array of integers.

The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.

i.e.

if the max score is 2, I want to do something with the top 40 docs that
are over a score of 1.5 (if there's only 20 >1.5, then that's fine, just
those 20, or if there's 60 >1.5 then only the top 40..)

and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match

Thanks

Guy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4T1-gKNg5hG503k3AKw9tmoP%3D_KV0V%2BQpjOm_Hi7zzBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Guy_Bowden · April 15, 2014, 1:14pm

Many thanks Adrien

On Friday, April 11, 2014 1:57:34 AM UTC+2, Adrien Grand wrote:

Hi,

Although it is generally not advised to return scores as percentages[1],
in this particular case it would work given that you are only querying
structured attributes and don't care about term frequencies and so on.

For this problem, I would recommend using the function_score query[2],
which allows to decide on how scores should be computed. Since you control
the score computation, you can know the maximum score and this can be
helpful to display the score as a percentage to your users.

[1] ScoresAsPercentages - Apache Lucene (Java) - Apache Software Foundation
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Thu, Apr 10, 2014 at 5:02 PM, Guy Bowden <g...@gentianesolutions.com<javascript:>

wrote:

Hi there,

is there any way to figure out a maximum theoretical score for a non text
search query - something like:

"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137
OR 193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}

At the moment I'm using Django Haystack to generate the query - so I
could build it differently if that helps.

Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?

Each document attribute I'm querying is just an array of integers.

The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.

i.e.

if the max score is 2, I want to do something with the top 40 docs that
are over a score of 1.5 (if there's only 20 >1.5, then that's fine, just
those 20, or if there's 60 >1.5 then only the top 40..)

and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match

Thanks

Guy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82033c84-7daa-4f95-af5a-7dd4ba8e3223%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Count maximum possible score for query Elasticsearch	3	973	July 13, 2018
Compute a possible "max_score" for a query Elasticsearch painless	1	712	July 10, 2019
Bool query score issue Elasticsearch	1	300	September 13, 2018
Maximum score of a field in script score - ElasticSearch Elasticsearch	3	2446	March 27, 2017
Expecting another result(scoring) on function_score Elasticsearch	2	413	October 23, 2018

Getting the max theoretical score for a search query

Related topics