What is required for partial match to work?

Hello,

I'm pretty much new to ElasticSearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/57f551b0897bf55c/be5276d04f0ba1f5?lnk=gst&q=Partial+Search#be5276d04f0ba1f5

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

Depends on what your data is going to be like. Are these real words, or
usernames?

You can use n-gram, but be careful as depending on your values for n you can
get lots of matches that may seem unrelated. I use n-gram for usernames.

On Mon, Jul 4, 2011 at 8:54 AM, rmartinez juneym@gmail.com wrote:

Hello,

I'm pretty much new to Elasticsearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:

http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/57f551b0897bf55c/be5276d04f0ba1f5?lnk=gst&q=Partial+Search#be5276d04f0ba1f5

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Hi Paul,

The is in the form of Articles and free-form text like classified ads.

I will try N-gram and see if it works for me. By the way, I used FLT
but it seems that I need to actually investigate why some "unrelated"
documents are matching... maybe it's too fuzzy :slight_smile:

Regards,
Raul

On Jul 4, 6:02 pm, Paul Loy ketera...@gmail.com wrote:

Depends on what your data is going to be like. Are these real words, or
usernames?

You can use n-gram, but be careful as depending on your values for n you can
get lots of matches that may seem unrelated. I use n-gram for usernames.

On Mon, Jul 4, 2011 at 8:54 AM, rmartinez jun...@gmail.com wrote:

Hello,

I'm pretty much new to Elasticsearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:

http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

--

Paul Loy
p...@keteracel.comhttp://uk.linkedin.com/in/paulloy

N-gram will probably be worse for matching unrelated docs...

On Tue, Jul 5, 2011 at 4:32 PM, rmartinez juneym@gmail.com wrote:

Hi Paul,

The is in the form of Articles and free-form text like classified ads.

I will try N-gram and see if it works for me. By the way, I used FLT
but it seems that I need to actually investigate why some "unrelated"
documents are matching... maybe it's too fuzzy :slight_smile:

Regards,
Raul

On Jul 4, 6:02 pm, Paul Loy ketera...@gmail.com wrote:

Depends on what your data is going to be like. Are these real words, or
usernames?

You can use n-gram, but be careful as depending on your values for n you
can
get lots of matches that may seem unrelated. I use n-gram for usernames.

On Mon, Jul 4, 2011 at 8:54 AM, rmartinez jun...@gmail.com wrote:

Hello,

I'm pretty much new to Elasticsearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:

http://groups.google.com/a/elasticsearch.com/group/users/browse_threa.
..

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

--

Paul Loy
p...@keteracel.comhttp://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

On Wed, 2011-07-06 at 10:31 +0100, Paul Loy wrote:

N-gram will probably be worse for matching unrelated docs...

Ngrams should usually only be used in the index_analyzer, not the
search_analyzer, to improve result relevancy.

For instance:

We index the text "apple" with an edge ngram analyzer, and get:

  • a,ap,app,appl,apple

When we analyze the search text, we can do it with ngrams or without
ngrams. This is what you would see:

The user wants to search for "application", and starts typing:

           no-ngram         ngram
--------------------------------------------
a          match            match
ap         match            match
app        match            match
appl       match            match
appli      no match         match*
applic     no match         match*

Those last two results show why you usually don't want to use the ngram
version at search time.

To achieve this, when you specify the mapping for a field, you can do:

{ my_content: {
type: "string",
index_analyzer: "my_ngram_analyzer",
search_analyzer: "default"
}}

clint

THis works for me.

{ content: {
type: "string",
index_analyzer: "ascAnalyzer1",
search_analyzer: "default"
}}

On Jul 6, 5:51 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

On Wed, 2011-07-06 at 10:31 +0100, Paul Loy wrote:

N-gram will probably be worse for matching unrelated docs...

Ngrams should usually only be used in the index_analyzer, not the
search_analyzer, to improve result relevancy.

For instance:

We index the text "apple" with an edge ngram analyzer, and get:

  • a,ap,app,appl,apple

When we analyze the search text, we can do it with ngrams or without
ngrams. This is what you would see:

The user wants to search for "application", and starts typing:

           no-ngram         ngram
--------------------------------------------
a          match            match
ap         match            match
app        match            match
appl       match            match
appli      no match         match*
applic     no match         match*

Those last two results show why you usually don't want to use the ngram
version at search time.

To achieve this, when you specify the mapping for a field, you can do:

{ my_content: {
type: "string",
index_analyzer: "my_ngram_analyzer",
search_analyzer: "default"
}}

clint