Hi!
I'm working on a system that's looking for geographical places. Here's the problem I'm having. I'm matching all terms in a query to fields like country and city. Let's say I'm looking for "Cancun, Mexico". I can't use multiterm query rewrite (there are reasons for that), so my query looks like this:
{"bool" : {
"must" : {
"dis_max" : {
"queries" : [
{"term" : { "city" : "Cancun" }},
{"term" : { "city" : "Mexico" }}
]
}
},
"should" : {
"dis_max" : {
"queries" : [
{"term" : { "country" : "Cancun" }},
{"term" : { "country" : "Mexico" }}
]
}
}
}}
Among my results are two cities: {"country" : "Mexico", "city" : "Mexico"}
(the capital) and {"country" : "Mexico", "city" : "Cancun"}
. Term frequencies aside, both locations essentially have the same score, because one in both cases there's a country and a city match.
The actual queries I'm using are a bit more complicated, but the problem is the same: how do I reward documents where more (or all) of the original terms matched? I can pass all terms to the scoring script if necessary, I just don't know how to find out which of those terms belong to the document.