On Wed, Jul 6, 2011 at 11:49 AM, Shay Banon
shay.banon@elasticsearch.com wrote:
Well, for these simplistic suggestion types on unanalyzed content,
there is no problem, you just need to select a Lookup implementation
that supports add() (e.g. TST, Jaspell, but not FST)Not sure they are that simplistic :), since it needs to be concurrent (even
a read write lock around it will be expensive) and allow for "deletion" (as
in reduce counter).
I think thats really overkill? You could always have terms in lucene
that have all deleted docs, which will affect spellcheck, too.
but in both cases, for a normal search engine suggest, the following hold true:
- normally you filter only high-freq terms (HighFrequencyDictionary or
thresholdFrequency in lucene), so the chance of spellcorrecting a
only-deleted docs term is minimized. - similar for suggest, the chance is minimal, e.g. if you are building
from say terms of the past N-days of query logs I don't think you gain
much by going to so much effort to expunge these all-deleted-terms for
things people stopped querying on, e.g. a periodic rebuild is
sufficient for your suggest to represent the past N-days trend.
But in general, this is the use case that the suggest/spellcheck
framework is geared towards (along with supplying floating point
quality weights, etc). If instead you want to do a suggest that acts
more like a primitive prefix query on a transactional database and
less like a search engine, I think using edge ngrams is the way to go,
as lucene will take care of that stuff for you.