Hi,
Yea, doing auto suggest can be done in several manners. The first is
simply executed a query against the relevant field, and only getting it
back. That field can be analyzed (for example, using ngrams) if it make
sense for the auto suggestion.
@mike: I saw the suggest module added, the problem with that is the fact
that it requires a full rebuild and can't be dynamically updated to reflect
changes in the index. This requires a system where periodic rebuilds are
done, and personally, not a big fan of that :). It can become expensive to
rebuild (file system cache invalidation), non intuitive (where it lacks real
or near real timeness). Though, the FST based one is cool :). If an in
memory auto suggest is required on a field, I was thinking of a non blocking
trie based data structure (derived from ConcurrentSkipList), but that
requires work
-shay.banon
On Monday, July 4, 2011 at 3:49 PM, Michael McCandless wrote:
Also, as of Lucene 3.3, the spellchecker modules
(lucene/contrib/spellchecker) now has 3 implementations for
auto-suggest, so in theory ES can just expose these?
Mike
http://blog.mikemccandless.com
On Mon, Jul 4, 2011 at 4:37 AM, Weiwei Wang ww.wang.cs@gmail.com wrote:
auto suggest can be implemented by add a field tokenized by
StandardTokenizer(or other) and filter by EdgeNGramFilter or
NGramFilter according to your project requirement.
You can look into the lucene contrib Spellchecker for some insight
understanding.
and this article
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
should be helpful for understanding search suggestion.
On Jul 4, 4:01 pm, stephane stephane.bastian....@gmail.com wrote:
Hello,
On Mon, 2011-07-04 at 00:15 -0700, Alexander Reelsen wrote:
Hi there,
I am currently trying to find the most elegant way to implement auto
suggest feature for my elasticsearch instance.
Currently there is only an index with products, which includes product
name, description, some image urls, some stock data etc...
Now I want to implement auto suggest only for the product name.
The first question is: Is it useful to use this product index or to
use a specific auto completion index for this? To keep the returned
data as small as possible I do not want to return all the product data
with each letter typed.
I am imagining an autocompletion solution like this and would love to
get some input whether this is useful at all and could be implemented
with elasticsearch:
- Create a handler (not sure if its an index) on /suggest/
- On any request like /suggest/foo or /suggest/foobar search for an
entry in the suggest index(?) and return something like
{
key: "foo",
suggestions: [ "foob", "fooba", "foobar", "foobored", "foo baz" ],
something: "else"
}
ES is definitely a good candidate to provide auto-suggest type
fonctionality.
IMHO the first thing you've got to focus-on is to decide the type of
auto-suggest you are looking for.
In your example ("foob", "fooba", "foobar"...), the auto-suggest seems
to be prefix-based. But is this always the case?
Also, what do you expect if someone enters a word which does not match
the prefix ? (for instance the user enters "francisco", do you expect
"san francisco" to be displayed?)
Another thing to consider is the expected behavior with sentences.
On top of my head you also should decide whether or not we'll need to
sort or highlight the results.
Based on this, you can make the right decision as to the most
appropriate combination of Tokenizer/Filters/Queries to use.
If you want to implement a quick autocompletion I suggest that you look
at the "text_phrase_prefix" query. it should give you a good
head-starthttp://www.elasticsearch.org/guide/reference/query-dsl/text-query.html
Hope this helps.
Since auto-suggest seems to be popular, how about if we enhance ES doc
with the most common auto-suggest use cases and solutions? (with pros
and cons, snippets of code, tokenizer/filter/query to use and such).
If other people are interested, I can certainly contribute to it
So, would something like this be useful and can it be automatically
created out of the product feed (or any feed to be more generic) to
ensure it includes all my product names (would not need to be
realtime, but updating once a day would be ok as well). Always adding
all possible typing combinations manually sounds like quite some
overhead, if there could be analyzers doing this work...
Thanks for any input!
--Alexander