What are good combinations of search analyzer, index analyzer and query for implementing an effective autocompleter using ElasticSearch?

dark_shadow · February 1, 2014, 6:15pm

Hi,

Elasticsearch is a powerful tool, no doubt but sometimes it can really make
you cry when you are not able to find out a good combination of index and
search analyzers along with a good query type for implementing an
autocompleter. I have read and searched on Internet that this combination
differs from one use case to another and only way to find out this is to
try and test it yourself. But can't we generalize this use case to some of
the extent ?

I'll take a base case where a document contains some title as a string and
some description as a string. Mostly people implement autocompleter around
such docs only. And a basic expectation of autocompleter is to find most
appropriate document corresponding to a user query. A good autocompleter
gives docs which exactly matches the user query but since user query can
vary a lot from the actual content, the best autocompleter can do is to
return docs which contains the maximum user typed terms and that thing is
accomplished by a good query mechanism. So, can't we generalize a good
combination for this base case. After that people can just extend that base
case for other parameters of their docs.

I think people who have spent time with ElasticSearch are aware of pros and
cons of almost every possible combination of these things. So, it can be a
good way to start a thread where people can actually share their thoughts,
experiences and suggestions on different possible combination of analyzers
and query types so that beginners don't have to struggle a lot initially
with using ElasticSearch.

I'll start with sharing mine combination (obviously it is not the best one
but still working on it to improve the effectiveness of my autocompleter):

I have used standard tokenizer along with token filters:
lowercase,asciifolding,suggestion_shingle,edgengrams (front). I have used
same analyzers for both searching and indexing. For the query type, I'm
using custom score query but somehow the results are not that
effective/tuned. I expect my autocompleter to give documents which contains
the maximum matching terms from a user typed query but it's not giving
results that way.I'm still working on fine tuning it.

I think the above combination solves the problem to a certain extent but
still there are a hell lot of other ways to go about it which I'm not aware
of.

I request you people to please give some suggestions, views and share your
personal experiences of going around this particular problem.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/81b589c2-b1a1-4f8e-8b3a-8e377e864123%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 1, 2014, 6:33pm

There is massive effort to implement autosuggest completion in most
convenient ways.

Since 0.90.3, there is the Lucene suggester implemented in ES

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

The Lucene FST is faster and more compact than n-grams and may serve most
use cases well.

But there is no general solution to autocomplete, like for search in
general. It depends on the words in the index and how to search them. E.g.
for german language, you probably need extra analysis for normalization
forms, like decompounding and baseform reduction, to better support what
the user wants.

If you look at (older) solutions that do not use Lucene FST, you can use
edgeNgram, a linguistic method that takes considerably more space. A demo
is here

http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htm

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEQve7qhKOLL4gFzZnt4zcoNcFUYYwSiKOPO1pE_rtRCA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

dark_shadow · February 1, 2014, 6:40pm

Jorg

The second link is not working. Completion suggester is a good thing but it
is restricted to prefix queries only I guess. You will have to give every
possible combination of a user typed query for a document to be matched.
Please correct me if I;m wrong

Thanks

On Sunday, 2 February 2014 00:03:09 UTC+5:30, Jörg Prante wrote:

There is massive effort to implement autosuggest completion in most
convenient ways.

Since 0.90.3, there is the Lucene suggester implemented in ES

Elasticsearch Platform — Find real-time answers at scale | Elastic

The Lucene FST is faster and more compact than n-grams and may serve most
use cases well.

But there is no general solution to autocomplete, like for search in
general. It depends on the words in the index and how to search them. E.g.
for german language, you probably need extra analysis for normalization
forms, like decompounding and baseform reduction, to better support what
the user wants.

If you look at (older) solutions that do not use Lucene FST, you can use
edgeNgram, a linguistic method that takes considerably more space. A demo
is here

http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htm

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/720c3be5-f5cf-4385-a6ba-da1263442290%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 1, 2014, 7:19pm

No, suggester is not restricted to prefix, in 0.90.4 fuzziness was added,
as documented. Fuzzy suggest completion means your query may contain errors
within an edit distance.

Fixed link (the final 'l' was truncated)

http://jprante.github.io/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHx0bN2uOvEtJJuKxbCAyL%2BeK2dWWF%3Dx0R-JS-QnXkvHw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · February 3, 2014, 10:17am

On 1 February 2014 20:19, joergprante@gmail.com joergprante@gmail.comwrote:

No, suggester is not restricted to prefix, in 0.90.4 fuzziness was added,
as documented. Fuzzy suggest completion means your query may contain errors
within an edit distance.

But, it is still a prefix suggester... You can't mix up the order of
words. This works really well for well formulated names eg song titles.
But for general search it can fail to match.

it's worth using the completion suggester as the first-line search and
falling back to edge ngrams if there are no matches.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKT5kfXoen%3DO2a%3D0yP9SuGjyvEh9E21ZLDc%2BZ_KMdyHuQA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Need suggestions on type of query to be used for a given analysis for better results? Elasticsearch	2	373	July 6, 2017
How to use ElasticSearch to implement Autocompleter? Elasticsearch	5	578	July 6, 2017
How to improve AutoComplete performance? Elasticsearch	13	2370	July 6, 2017
How to and efficient way to combine standard tokenizer with autocomplete (type ahead) functionality Elasticsearch	1	329	July 6, 2017
Even searching with elasticsearch I wasn't able to find a solution Elasticsearch	1	797	August 30, 2017

What are good combinations of search analyzer, index analyzer and query for implementing an effective autocompleter using ElasticSearch?

Related topics