Need Help With AutoSuggest Urgently

Hi All,

My purpose is to make a autosuggest.
example: use type* Heal Th*
then suggest: I heal the world, heal the world, heal them ....

So with ES, I use edgeNGram to index and default for search:
Let see my config first:

index:
analysis:
filter:
my_gram_filter:
type: edgeNGram
side: front
min_gram: 1
max_gram: 10
tokenizer:
my_gram:
type: edgeNGram
side: front
min_gram: 1
max_gram: 10
analyzer:
default:
tokenizer: standard
filter: [asciifolding,lowercase]
auto:
type: custom
tokenizer: my_gram
filter: [asciifolding,lowercase]
auto2:
type: custom
tokenizer: standard
filter: [standard,lowercase,asciifolding,my_gram_filter]

you could see that:

auto: use edgeNgram at tokenizer
aut2: use edgeNgram at filter.

For example, with the test:* "Hello World"*

Auto will : h, he, hel, hell, hello, hello , hello w, hello wo, hello
wor, hello worl, hello world
Auto2 will: h, he, hel, hell, hello, w, wo, wor, worl, world

The problem is:

#1: Which is better for suggestion for my above purpose ?
#2: Which type of search request should I do ?

I have tried
with: textPhrasePrefixQuery, prefixQuery, textQuery, textPhraseQuery, wildcardQuery, fieldQuery, termQuery, spanTermQuery
with use default analyzer. and I found that:

only textPhrasePrefixQuery match to my purpose. But it cause the problem too
many clauses failure (default 1024) ....

I have tried with wildcard, text Query, but it is strange work.

For other query (Excep textPhrasePrefixQuery) I have checked and found that
it have several problems as bellow:

Don't match more then two words. ex: search hello will give "*hello

world*" but search hello world will show nothing.

Don't match partial at world. ex: search "hello " will show nothing.

Don't match exactly . ex: search "hello w" will show "hello abc"

Hope anyone could help me :

  1. which analyzer, filter for index & search should I use for my above
    purpose ?
  2. how does ES compare between search-analyzer and index-analyzer to match
    result ?

Thanks in advance.

Sang Dang.

--

Just a few hints:

  • please try to avoid min_gram=1, I think you never will autosuggest words
    with just a length of 1. This is very expensive.

  • "auto2" is what you want. Note, autocompletion with edge n-grams is a per
    word suggestion algorithm. There is no easy solution for suggesting phrases
    based on edge n-grams (you will prefer a custom phrase dictionary in an FSA
    for this purpose)

  • for n-grams analyzers, use such analyzer only for indexing, not for
    search. Then you can use a simple "match" ("text") query.
    MatchPhrasePrefixQuery is very expensive for autocompletion

  • If you are interested, have a look
    at http://jprante.github.com/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.html
    for my autocomplete solution

Hope this helps,

Cheers,

Jörg

--

Hi,

Thanks for your reply.
As I said above, I have check with other search type and it has some
problem with is not fix my purpose:

ex:
Data: Nothing else matter, Nothing at all
Query: Nothing e,

If I use text query, then it will show both of them nothing else matter &
Nothing at all while I just want to show Nothing else matter.

Here is my code:

TextQueryBuilder tq = QueryBuilders.textQuery("field_test", query);
SearchRequestBuilder srb = cli.prepareSearch("index_test")
.setQuery(tq)
.setTypes("type_test");

Thanks.

On Saturday, December 1, 2012 8:08:56 AM UTC+7, Jörg Prante wrote:

Just a few hints:

  • please try to avoid min_gram=1, I think you never will autosuggest words
    with just a length of 1. This is very expensive.

  • "auto2" is what you want. Note, autocompletion with edge n-grams is a
    per word suggestion algorithm. There is no easy solution for suggesting
    phrases based on edge n-grams (you will prefer a custom phrase dictionary
    in an FSA for this purpose)

  • for n-grams analyzers, use such analyzer only for indexing, not for
    search. Then you can use a simple "match" ("text") query.
    MatchPhrasePrefixQuery is very expensive for autocompletion

  • If you are interested, have a look at
    http://jprante.github.com/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htmlfor my autocomplete solution

Hope this helps,

Cheers,

Jörg

--

I've been grappling with the same problem:

  • match_phrase_prefix has performance / exception problems with larger
    data sets, but provides perfect results
  • nGrams provide good performance on the same data, but can't match
    across word boundaries

My current (but hacky) solution is to cap the number of expansions
performed by the mach_phrase_prefix to 1023:

{
"match_phrase_prefix" : {
"message" : {
"query" : "this is a test",
"max_expansions" : 1023
}
}
}

This will stop the exception from being triggered, but will still have performance problems with some searches. You could work around this with caching of the query results?

An alternative I have been considering is to have my search tool send an nGram query if the search phrase is <=3 characters, then switch to match_phrase_prefix after that.

Hope this is of some help.

Doug

On Monday, December 3, 2012 4:04:16 AM UTC, kidkid wrote:

Hi,

Thanks for your reply.
As I said above, I have check with other search type and it has some
problem with is not fix my purpose:

ex:
Data: Nothing else matter, Nothing at all
Query: Nothing e,

If I use text query, then it will show both of them nothing else matter &
Nothing at all while I just want to show Nothing else matter.

Here is my code:

TextQueryBuilder tq = QueryBuilders.textQuery("field_test", query);
SearchRequestBuilder srb = cli.prepareSearch("index_test")
.setQuery(tq)
.setTypes("type_test");

Thanks.

On Saturday, December 1, 2012 8:08:56 AM UTC+7, Jörg Prante wrote:

Just a few hints:

  • please try to avoid min_gram=1, I think you never will autosuggest
    words with just a length of 1. This is very expensive.

  • "auto2" is what you want. Note, autocompletion with edge n-grams is a
    per word suggestion algorithm. There is no easy solution for suggesting
    phrases based on edge n-grams (you will prefer a custom phrase dictionary
    in an FSA for this purpose)

  • for n-grams analyzers, use such analyzer only for indexing, not for
    search. Then you can use a simple "match" ("text") query.
    MatchPhrasePrefixQuery is very expensive for autocompletion

  • If you are interested, have a look at
    http://jprante.github.com/applications/2012/08/17/Autocompletion-with-jQuery-JAX-RS-and-Elasticsearch.htmlfor my autocomplete solution

Hope this helps,

Cheers,

Jörg

--