Auto suggest with elasticsearch

On Wed, Jul 6, 2011 at 11:49 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

Well, for these simplistic suggestion types on unanalyzed content,
there is no problem, you just need to select a Lookup implementation
that supports add() (e.g. TST, Jaspell, but not FST)

Not sure they are that simplistic :), since it needs to be concurrent (even
a read write lock around it will be expensive) and allow for "deletion" (as
in reduce counter).

I think thats really overkill? You could always have terms in lucene
that have all deleted docs, which will affect spellcheck, too.

but in both cases, for a normal search engine suggest, the following hold true:

  • normally you filter only high-freq terms (HighFrequencyDictionary or
    thresholdFrequency in lucene), so the chance of spellcorrecting a
    only-deleted docs term is minimized.
  • similar for suggest, the chance is minimal, e.g. if you are building
    from say terms of the past N-days of query logs I don't think you gain
    much by going to so much effort to expunge these all-deleted-terms for
    things people stopped querying on, e.g. a periodic rebuild is
    sufficient for your suggest to represent the past N-days trend.

But in general, this is the use case that the suggest/spellcheck
framework is geared towards (along with supplying floating point
quality weights, etc). If instead you want to do a suggest that acts
more like a primitive prefix query on a transactional database and
less like a search engine, I think using edge ngrams is the way to go,
as lucene will take care of that stuff for you.