Need Help With AutoSuggest Urgently

Hi All,

My purpose is to make a autosuggest.
example: use type* Heal Th*
then suggest: I heal the world, heal the world, heal them ....

So with ES, I use edgeNGram to index and default for search:
Let see my config first:

type: edgeNGram
side: front
min_gram: 1
max_gram: 10
type: edgeNGram
side: front
min_gram: 1
max_gram: 10
tokenizer: standard
filter: [asciifolding,lowercase]
type: custom
tokenizer: my_gram
filter: [asciifolding,lowercase]
type: custom
tokenizer: standard
filter: [standard,lowercase,asciifolding,my_gram_filter]

you could see that:

auto: use edgeNgram at tokenizer
aut2: use edgeNgram at filter.

For example, with the test:* "Hello World"*

Auto will : h, he, hel, hell, hello, hello , hello w, hello wo, hello
wor, hello worl, hello world
Auto2 will: h, he, hel, hell, hello, w, wo, wor, worl, world

The problem is:

#1: Which is better for suggestion for my above purpose ?
#2: Which type of search request should I do ?

I have tried
with: textPhrasePrefixQuery, prefixQuery, textQuery, textPhraseQuery, wildcardQuery, fieldQuery, termQuery, spanTermQuery
with use default analyzer. and I found that:

only textPhrasePrefixQuery match to my purpose. But it cause the problem too
many clauses failure (default 1024) ....

I have tried with wildcard, text Query, but it is strange work.

For other query (Excep textPhrasePrefixQuery) I have checked and found that
it have several problems as bellow:

Don't match more then two words. ex: search hello will give "*hello

world*" but search hello world will show nothing.

Don't match partial at world. ex: search "hello " will show nothing.

Don't match exactly . ex: search "hello w" will show "hello abc"

Hope anyone could help me :

  1. which analyzer, filter for index & search should I use for my above
    purpose ?
  2. how does ES compare between search-analyzer and index-analyzer to match
    result ?

Thanks in advance.

Sang Dang.


Just a few hints:

  • please try to avoid min_gram=1, I think you never will autosuggest words
    with just a length of 1. This is very expensive.

  • "auto2" is what you want. Note, autocompletion with edge n-grams is a per
    word suggestion algorithm. There is no easy solution for suggesting phrases
    based on edge n-grams (you will prefer a custom phrase dictionary in an FSA
    for this purpose)

  • for n-grams analyzers, use such analyzer only for indexing, not for
    search. Then you can use a simple "match" ("text") query.
    MatchPhrasePrefixQuery is very expensive for autocompletion

  • If you are interested, have a look
    for my autocomplete solution

Hope this helps,





Thanks for your reply.
As I said above, I have check with other search type and it has some
problem with is not fix my purpose:

Data: Nothing else matter, Nothing at all
Query: Nothing e,

If I use text query, then it will show both of them nothing else matter &
Nothing at all while I just want to show Nothing else matter.

Here is my code:

TextQueryBuilder tq = QueryBuilders.textQuery("field_test", query);
SearchRequestBuilder srb = cli.prepareSearch("index_test")


On Saturday, December 1, 2012 8:08:56 AM UTC+7, Jörg Prante wrote:

Just a few hints:

  • please try to avoid min_gram=1, I think you never will autosuggest words
    with just a length of 1. This is very expensive.

  • "auto2" is what you want. Note, autocompletion with edge n-grams is a
    per word suggestion algorithm. There is no easy solution for suggesting
    phrases based on edge n-grams (you will prefer a custom phrase dictionary
    in an FSA for this purpose)

  • for n-grams analyzers, use such analyzer only for indexing, not for
    search. Then you can use a simple "match" ("text") query.
    MatchPhrasePrefixQuery is very expensive for autocompletion

  • If you are interested, have a look at my autocomplete solution

Hope this helps,




I've been grappling with the same problem:

  • match_phrase_prefix has performance / exception problems with larger
    data sets, but provides perfect results
  • nGrams provide good performance on the same data, but can't match
    across word boundaries

My current (but hacky) solution is to cap the number of expansions
performed by the mach_phrase_prefix to 1023:

"match_phrase_prefix" : {
"message" : {
"query" : "this is a test",
"max_expansions" : 1023

This will stop the exception from being triggered, but will still have performance problems with some searches. You could work around this with caching of the query results?

An alternative I have been considering is to have my search tool send an nGram query if the search phrase is <=3 characters, then switch to match_phrase_prefix after that.

Hope this is of some help.


On Monday, December 3, 2012 4:04:16 AM UTC, kidkid wrote:


Thanks for your reply.
As I said above, I have check with other search type and it has some
problem with is not fix my purpose:

Data: Nothing else matter, Nothing at all
Query: Nothing e,

If I use text query, then it will show both of them nothing else matter &
Nothing at all while I just want to show Nothing else matter.

Here is my code:

TextQueryBuilder tq = QueryBuilders.textQuery("field_test", query);
SearchRequestBuilder srb = cli.prepareSearch("index_test")


On Saturday, December 1, 2012 8:08:56 AM UTC+7, Jörg Prante wrote:

Just a few hints:

  • please try to avoid min_gram=1, I think you never will autosuggest
    words with just a length of 1. This is very expensive.

  • "auto2" is what you want. Note, autocompletion with edge n-grams is a
    per word suggestion algorithm. There is no easy solution for suggesting
    phrases based on edge n-grams (you will prefer a custom phrase dictionary
    in an FSA for this purpose)

  • for n-grams analyzers, use such analyzer only for indexing, not for
    search. Then you can use a simple "match" ("text") query.
    MatchPhrasePrefixQuery is very expensive for autocompletion

  • If you are interested, have a look at my autocomplete solution

Hope this helps,


