Find out how many given terms are found in the document


(Beowulfenator) #1

Hi!

I'm working on a system that's looking for geographical places. Here's the problem I'm having. I'm matching all terms in a query to fields like country and city. Let's say I'm looking for "Cancun, Mexico". I can't use multiterm query rewrite (there are reasons for that), so my query looks like this:

{"bool" : {
    "must" : {
        "dis_max" : {
            "queries" : [
                {"term" : { "city" : "Cancun" }},
                {"term" : { "city" : "Mexico" }}
            ]
        }
    },
    "should" : {
        "dis_max" : {
            "queries" : [
                {"term" : { "country" : "Cancun" }},
                {"term" : { "country" : "Mexico" }}
            ]
        }
    }
}}

Among my results are two cities: {"country" : "Mexico", "city" : "Mexico"} (the capital) and {"country" : "Mexico", "city" : "Cancun"}. Term frequencies aside, both locations essentially have the same score, because one in both cases there's a country and a city match.

The actual queries I'm using are a bit more complicated, but the problem is the same: how do I reward documents where more (or all) of the original terms matched? I can pass all terms to the scoring script if necessary, I just don't know how to find out which of those terms belong to the document.


(system) #2