Fuzzy, nGrams.. for incorrectly spelt phrases?


(Squalli2008) #1

The documents I am looking for have titles of

"Old Book of History"
"New Book of History"
"The Book of Histories"

I would like to search for the following and find all documents but with "Old Book of History" at the top of the results:

old histry (i.e. simple spelling mistake on one of the words)
bok old histry (i.e. spelling mistakes in 2 words and words in any order)

So far I have this working by mapping the title to a new title.titleNGram field and using an nGram analyzer for indexing and a snowball analyzer at search time.

In order to boost exact matches the query looks something like this:

  "query": {
    "bool" : {
    "should": [
        {
            "match": {
              "title": {
                "query": "history",
                "boost": 10
              }
            }
        },
        {
            "match": {
              "title.titleNGram": {
                "query": "history",
                "fuzziness": 2,
                "operator": "and",
                "prefix_length": 3,
                "analyzer" : "snowball"
              }
            }
        }
    ]
}

}

So my question is really... Is this the right way to be doing this?!


(system) #2