Reasonable values for fuzziness, prefix_length, and max_expansions


(clay.wardell) #1

I'm trying to implement a business search and I want to allow for misspellings and typos in search queries. I'm sure I will have to tune things by trial and error, but can anyone provide me with reasonable starting points for fuzziness, prefix_length, and max_expansions in my TextQueries? All of them seem relevant to the fuzzy matching process.

My index consists of about 300,000 searchable documents. As for query volume, we are trying to launch a website supporting a few thousand users in the short term. I know the values I'm asking about affect performance, so I thought that it might be relevant to give you all an idea of our scale.

Thanks!

Clay


(Ævar Arnfjörð Bjarmason) #2

On Thu, Aug 30, 2012 at 7:44 PM, clay.wardell clay@blueshiftlocal.com wrote:

I'm trying to implement a business search and I want to allow for
misspellings and typos in search queries. I'm sure I will have to tune
things by trial and error, but can anyone provide me with reasonable
starting points for fuzziness, prefix_length, and max_expansions in my
TextQueries? All of them seem relevant to the fuzzy matching process.

My index consists of about 300,000 searchable documents. As for query
volume, we are trying to launch a website supporting a few thousand users in
the short term. I know the values I'm asking about affect performance, so I
thought that it might be relevant to give you all an idea of our scale.

Note that the fuzzy queries are quite expensive, I've had much better
search results (and more performant results) by using nGram tokenizers
to do fuzzy matches like these.

--


(system) #3