What is required for partial match to work?


(Raul, Jr. Martinez) #1

Hello,

I'm pretty much new to ElasticSearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/57f551b0897bf55c/be5276d04f0ba1f5?lnk=gst&q=Partial+Search#be5276d04f0ba1f5

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul


(Paul Loy) #2

Depends on what your data is going to be like. Are these real words, or
usernames?

You can use n-gram, but be careful as depending on your values for n you can
get lots of matches that may seem unrelated. I use n-gram for usernames.

On Mon, Jul 4, 2011 at 8:54 AM, rmartinez juneym@gmail.com wrote:

Hello,

I'm pretty much new to ElasticSearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:

http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/57f551b0897bf55c/be5276d04f0ba1f5?lnk=gst&q=Partial+Search#be5276d04f0ba1f5

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Raul, Jr. Martinez) #3

Hi Paul,

The is in the form of Articles and free-form text like classified ads.

I will try N-gram and see if it works for me. By the way, I used FLT
but it seems that I need to actually investigate why some "unrelated"
documents are matching... maybe it's too fuzzy :slight_smile:

Regards,
Raul

On Jul 4, 6:02 pm, Paul Loy ketera...@gmail.com wrote:

Depends on what your data is going to be like. Are these real words, or
usernames?

You can use n-gram, but be careful as depending on your values for n you can
get lots of matches that may seem unrelated. I use n-gram for usernames.

On Mon, Jul 4, 2011 at 8:54 AM, rmartinez jun...@gmail.com wrote:

Hello,

I'm pretty much new to ElasticSearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:

http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

--

Paul Loy
p...@keteracel.comhttp://uk.linkedin.com/in/paulloy


(Paul Loy) #4

N-gram will probably be worse for matching unrelated docs...

On Tue, Jul 5, 2011 at 4:32 PM, rmartinez juneym@gmail.com wrote:

Hi Paul,

The is in the form of Articles and free-form text like classified ads.

I will try N-gram and see if it works for me. By the way, I used FLT
but it seems that I need to actually investigate why some "unrelated"
documents are matching... maybe it's too fuzzy :slight_smile:

Regards,
Raul

On Jul 4, 6:02 pm, Paul Loy ketera...@gmail.com wrote:

Depends on what your data is going to be like. Are these real words, or
usernames?

You can use n-gram, but be careful as depending on your values for n you
can
get lots of matches that may seem unrelated. I use n-gram for usernames.

On Mon, Jul 4, 2011 at 8:54 AM, rmartinez jun...@gmail.com wrote:

Hello,

I'm pretty much new to ElasticSearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:

http://groups.google.com/a/elasticsearch.com/group/users/browse_threa.
..

Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.

I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?

Thanks,

Raul

--

Paul Loy
p...@keteracel.comhttp://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Clinton Gormley) #5

On Wed, 2011-07-06 at 10:31 +0100, Paul Loy wrote:

N-gram will probably be worse for matching unrelated docs...

Ngrams should usually only be used in the index_analyzer, not the
search_analyzer, to improve result relevancy.

For instance:

We index the text "apple" with an edge ngram analyzer, and get:

  • a,ap,app,appl,apple

When we analyze the search text, we can do it with ngrams or without
ngrams. This is what you would see:

The user wants to search for "application", and starts typing:

           no-ngram         ngram
--------------------------------------------
a          match            match
ap         match            match
app        match            match
appl       match            match
appli      no match         match*
applic     no match         match*

Those last two results show why you usually don't want to use the ngram
version at search time.

To achieve this, when you specify the mapping for a field, you can do:

{ my_content: {
type: "string",
index_analyzer: "my_ngram_analyzer",
search_analyzer: "default"
}}

clint


(Raul, Jr. Martinez) #6

THis works for me.

{ content: {
type: "string",
index_analyzer: "ascAnalyzer1",
search_analyzer: "default"
}}

On Jul 6, 5:51 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

On Wed, 2011-07-06 at 10:31 +0100, Paul Loy wrote:

N-gram will probably be worse for matching unrelated docs...

Ngrams should usually only be used in the index_analyzer, not the
search_analyzer, to improve result relevancy.

For instance:

We index the text "apple" with an edge ngram analyzer, and get:

  • a,ap,app,appl,apple

When we analyze the search text, we can do it with ngrams or without
ngrams. This is what you would see:

The user wants to search for "application", and starts typing:

           no-ngram         ngram
--------------------------------------------
a          match            match
ap         match            match
app        match            match
appl       match            match
appli      no match         match*
applic     no match         match*

Those last two results show why you usually don't want to use the ngram
version at search time.

To achieve this, when you specify the mapping for a field, you can do:

{ my_content: {
type: "string",
index_analyzer: "my_ngram_analyzer",
search_analyzer: "default"
}}

clint


(system) #7