is there any way to figure out a maximum theoretical score for a non text
search query - something like:
"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137 OR
193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}
At the moment I'm using Django Haystack to generate the query - so I could
build it differently if that helps.
Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?
Each document attribute I'm querying is just an array of integers.
The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.
i.e.
if the max score is 2, I want to do something with the top 40 docs that are
over a score of 1.5 (if there's only 20 >1.5, then that's fine, just those
20, or if there's 60 >1.5 then only the top 40..)
and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match
Although it is generally not advised to return scores as percentages[1], in
this particular case it would work given that you are only querying
structured attributes and don't care about term frequencies and so on.
For this problem, I would recommend using the function_score query[2],
which allows to decide on how scores should be computed. Since you control
the score computation, you can know the maximum score and this can be
helpful to display the score as a percentage to your users.
is there any way to figure out a maximum theoretical score for a non text
search query - something like:
"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137 OR
193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}
At the moment I'm using Django Haystack to generate the query - so I could
build it differently if that helps.
Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?
Each document attribute I'm querying is just an array of integers.
The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.
i.e.
if the max score is 2, I want to do something with the top 40 docs that
are over a score of 1.5 (if there's only 20 >1.5, then that's fine, just
those 20, or if there's 60 >1.5 then only the top 40..)
and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match
On Friday, April 11, 2014 1:57:34 AM UTC+2, Adrien Grand wrote:
Hi,
Although it is generally not advised to return scores as percentages[1],
in this particular case it would work given that you are only querying
structured attributes and don't care about term frequencies and so on.
For this problem, I would recommend using the function_score query[2],
which allows to decide on how scores should be computed. Since you control
the score computation, you can know the maximum score and this can be
helpful to display the score as a percentage to your users.
is there any way to figure out a maximum theoretical score for a non text
search query - something like:
"query": {
"query_string": {
"query": "location:(1)^3 accommodation_comfort:(137
OR 193)^2 accommodation_facilities:(459 OR 403 OR 319)",
"default_operator": "OR",
"default_field": "text",
"auto_generate_phrase_queries": true,
"analyze_wildcard": true
}
}
At the moment I'm using Django Haystack to generate the query - so I
could build it differently if that helps.
Or is there a simple bit of maths I could do based on the number of
attributes and the boost scores I'm giving - (assuming that there's no
extra boosting going on at index time)?
Each document attribute I'm querying is just an array of integers.
The search is working fine - I'd just like to be able to know if I have a
100% match, or a 50% match and so on so I can decide if I want to use the
returned doc in the next step of the process.
i.e.
if the max score is 2, I want to do something with the top 40 docs that
are over a score of 1.5 (if there's only 20 >1.5, then that's fine, just
those 20, or if there's 60 >1.5 then only the top 40..)
and as the query can change, the max score changes, so "query":
"activity:(1) location:(1)^3" - is getting me a max score of 1.264911
because there's not much to search on - (I know because I can see the top
results do exactly match..)
but a more complex search like "query": "activity:(1) location:(2)^3
accommodation_facilities:(459 OR 347 OR 319)^1
accommodation_comfort:(193)^2", is scoring 2.1828206 as a 100% match
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.