Getting the max theoretical score for a search query


(Guy Bowden) #1

Hi there,

is there any way to figure out a maximum theoretical score for a non text
search query - something like:

"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137 OR
193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}

At the moment I'm using Django Haystack to generate the query - so I could
build it differently if that helps.

Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?

Each document attribute I'm querying is just an array of integers.

The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.

i.e.

if the max score is 2, I want to do something with the top 40 docs that are
over a score of 1.5 (if there's only 20 >1.5, then that's fine, just those
20, or if there's 60 >1.5 then only the top 40..)

and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match

Thanks

Guy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #2

Hi,

Although it is generally not advised to return scores as percentages[1], in
this particular case it would work given that you are only querying
structured attributes and don't care about term frequencies and so on.

For this problem, I would recommend using the function_score query[2],
which allows to decide on how scores should be computed. Since you control
the score computation, you can know the maximum score and this can be
helpful to display the score as a percentage to your users.

[1] http://wiki.apache.org/lucene-java/ScoresAsPercentages
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/query-dsl-function-score-query.html

On Thu, Apr 10, 2014 at 5:02 PM, Guy Bowden guy@gentianesolutions.comwrote:

Hi there,

is there any way to figure out a maximum theoretical score for a non text
search query - something like:

"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137 OR
193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}

At the moment I'm using Django Haystack to generate the query - so I could
build it differently if that helps.

Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?

Each document attribute I'm querying is just an array of integers.

The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.

i.e.

if the max score is 2, I want to do something with the top 40 docs that
are over a score of 1.5 (if there's only 20 >1.5, then that's fine, just
those 20, or if there's 60 >1.5 then only the top 40..)

and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match

Thanks

Guy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4T1-gKNg5hG503k3AKw9tmoP%3D_KV0V%2BQpjOm_Hi7zzBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Guy Bowden) #3

Many thanks Adrien

On Friday, April 11, 2014 1:57:34 AM UTC+2, Adrien Grand wrote:

Hi,

Although it is generally not advised to return scores as percentages[1],
in this particular case it would work given that you are only querying
structured attributes and don't care about term frequencies and so on.

For this problem, I would recommend using the function_score query[2],
which allows to decide on how scores should be computed. Since you control
the score computation, you can know the maximum score and this can be
helpful to display the score as a percentage to your users.

[1] http://wiki.apache.org/lucene-java/ScoresAsPercentages
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/query-dsl-function-score-query.html

On Thu, Apr 10, 2014 at 5:02 PM, Guy Bowden <g...@gentianesolutions.com<javascript:>

wrote:

Hi there,

is there any way to figure out a maximum theoretical score for a non text
search query - something like:

"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137
OR 193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}

At the moment I'm using Django Haystack to generate the query - so I
could build it differently if that helps.

Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?

Each document attribute I'm querying is just an array of integers.

The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.

i.e.

if the max score is 2, I want to do something with the top 40 docs that
are over a score of 1.5 (if there's only 20 >1.5, then that's fine, just
those 20, or if there's 60 >1.5 then only the top 40..)

and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match

Thanks

Guy

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/075dd752-a6a2-4044-862a-257a2029ee12%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82033c84-7daa-4f95-af5a-7dd4ba8e3223%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4