What are good combinations of search analyzer, index analyzer and query for implementing an effective autocompleter using ElasticSearch?


(dark_shadow) #1

Hi,

Elasticsearch is a powerful tool, no doubt but sometimes it can really make
you cry when you are not able to find out a good combination of index and
search analyzers along with a good query type for implementing an
autocompleter. I have read and searched on Internet that this combination
differs from one use case to another and only way to find out this is to
try and test it yourself. But can't we generalize this use case to some of
the extent ?

I'll take a base case where a document contains some title as a string and
some description as a string. Mostly people implement autocompleter around
such docs only. And a basic expectation of autocompleter is to find most
appropriate document corresponding to a user query. A good autocompleter
gives docs which exactly matches the user query but since user query can
vary a lot from the actual content, the best autocompleter can do is to
return docs which contains the maximum user typed terms and that thing is
accomplished by a good query mechanism. So, can't we generalize a good
combination for this base case. After that people can just extend that base
case for other parameters of their docs.

I think people who have spent time with ElasticSearch are aware of pros and
cons of almost every possible combination of these things. So, it can be a
good way to start a thread where people can actually share their thoughts,
experiences and suggestions on different possible combination of analyzers
and query types so that beginners don't have to struggle a lot initially
with using ElasticSearch.

I'll start with sharing mine combination (obviously it is not the best one
but still working on it to improve the effectiveness of my autocompleter):

I have used standard tokenizer along with token filters:
lowercase,asciifolding,suggestion_shingle,edgengrams (front). I have used
same analyzers for both searching and indexing. For the query type, I'm
using custom score query but somehow the results are not that
effective/tuned. I expect my autocompleter to give documents which contains
the maximum matching terms from a user typed query but it's not giving
results that way.I'm still working on fine tuning it.

I think the above combination solves the problem to a certain extent but
still there are a hell lot of other ways to go about it which I'm not aware
of.

I request you people to please give some suggestions, views and share your
personal experiences of going around this particular problem.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/81b589c2-b1a1-4f8e-8b3a-8e377e864123%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

There is massive effort to implement autosuggest completion in most
convenient ways.

Since 0.90.3, there is the Lucene suggester implemented in ES

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

The Lucene FST is faster and more compact than n-grams and may serve most
use cases well.

But there is no general solution to autocomplete, like for search in
general. It depends on the words in the index and how to search them. E.g.
for german language, you probably need extra analysis for normalization
forms, like decompounding and baseform reduction, to better support what
the user wants.

If you look at (older) solutions that do not use Lucene FST, you can use
edgeNgram, a linguistic method that takes considerably more space. A demo
is here

http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htm

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEQve7qhKOLL4gFzZnt4zcoNcFUYYwSiKOPO1pE_rtRCA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #3

Jorg

The second link is not working. Completion suggester is a good thing but it
is restricted to prefix queries only I guess. You will have to give every
possible combination of a user typed query for a document to be matched.
Please correct me if I;m wrong

Thanks

On Sunday, 2 February 2014 00:03:09 UTC+5:30, Jörg Prante wrote:

There is massive effort to implement autosuggest completion in most
convenient ways.

Since 0.90.3, there is the Lucene suggester implemented in ES

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

The Lucene FST is faster and more compact than n-grams and may serve most
use cases well.

But there is no general solution to autocomplete, like for search in
general. It depends on the words in the index and how to search them. E.g.
for german language, you probably need extra analysis for normalization
forms, like decompounding and baseform reduction, to better support what
the user wants.

If you look at (older) solutions that do not use Lucene FST, you can use
edgeNgram, a linguistic method that takes considerably more space. A demo
is here

http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htm

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/720c3be5-f5cf-4385-a6ba-da1263442290%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

No, suggester is not restricted to prefix, in 0.90.4 fuzziness was added,
as documented. Fuzzy suggest completion means your query may contain errors
within an edit distance.

Fixed link (the final 'l' was truncated)

http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHx0bN2uOvEtJJuKxbCAyL%2BeK2dWWF%3Dx0R-JS-QnXkvHw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #5

On 1 February 2014 20:19, joergprante@gmail.com joergprante@gmail.comwrote:

No, suggester is not restricted to prefix, in 0.90.4 fuzziness was added,
as documented. Fuzzy suggest completion means your query may contain errors
within an edit distance.

But, it is still a prefix suggester... You can't mix up the order of
words. This works really well for well formulated names eg song titles.
But for general search it can fail to match.

it's worth using the completion suggester as the first-line search and
falling back to edge ngrams if there are no matches.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKT5kfXoen%3DO2a%3D0yP9SuGjyvEh9E21ZLDc%2BZ_KMdyHuQA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6