Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.
I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?
Depends on what your data is going to be like. Are these real words, or
usernames?
You can use n-gram, but be careful as depending on your values for n you can
get lots of matches that may seem unrelated. I use n-gram for usernames.
On Mon, Jul 4, 2011 at 8:54 AM, rmartinez juneym@gmail.com wrote:
Hello,
I'm pretty much new to Elasticsearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:
Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.
I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?
The is in the form of Articles and free-form text like classified ads.
I will try N-gram and see if it works for me. By the way, I used FLT
but it seems that I need to actually investigate why some "unrelated"
documents are matching... maybe it's too fuzzy
Depends on what your data is going to be like. Are these real words, or
usernames?
You can use n-gram, but be careful as depending on your values for n you can
get lots of matches that may seem unrelated. I use n-gram for usernames.
On Mon, Jul 4, 2011 at 8:54 AM, rmartinez jun...@gmail.com wrote:
Hello,
I'm pretty much new to Elasticsearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:
Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.
I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?
Thanks,
Raul
--
Paul Loy
p...@keteracel.comhttp://uk.linkedin.com/in/paulloy
N-gram will probably be worse for matching unrelated docs...
On Tue, Jul 5, 2011 at 4:32 PM, rmartinez juneym@gmail.com wrote:
Hi Paul,
The is in the form of Articles and free-form text like classified ads.
I will try N-gram and see if it works for me. By the way, I used FLT
but it seems that I need to actually investigate why some "unrelated"
documents are matching... maybe it's too fuzzy
Depends on what your data is going to be like. Are these real words, or
usernames?
You can use n-gram, but be careful as depending on your values for n you
can
get lots of matches that may seem unrelated. I use n-gram for usernames.
On Mon, Jul 4, 2011 at 8:54 AM, rmartinez jun...@gmail.com wrote:
Hello,
I'm pretty much new to Elasticsearch and my question is on partial
match and is somehow related to an older post and figured out that the
old thread didn't contain answers I was looking for:
Anyway, I have one document with title containing the word
"ULTRALIGHT". I want to make sure that when I search for "ULTRA" or
"LIGHT", the said document should be included in the result set.
I am using query_string when searching. How do I go about making sure
that I get this result? Should I be using Fuzzy or FLT?
Thanks,
Raul
--
Paul Loy
p...@keteracel.comhttp://uk.linkedin.com/in/paulloy
On Wed, 2011-07-06 at 10:31 +0100, Paul Loy wrote:
N-gram will probably be worse for matching unrelated docs...
Ngrams should usually only be used in the index_analyzer, not the
search_analyzer, to improve result relevancy.
For instance:
We index the text "apple" with an edge ngram analyzer, and get:
a,ap,app,appl,apple
When we analyze the search text, we can do it with ngrams or without
ngrams. This is what you would see:
The user wants to search for "application", and starts typing:
no-ngram ngram
--------------------------------------------
a match match
ap match match
app match match
appl match match
appli no match match*
applic no match match*
Those last two results show why you usually don't want to use the ngram
version at search time.
To achieve this, when you specify the mapping for a field, you can do:
On Wed, 2011-07-06 at 10:31 +0100, Paul Loy wrote:
N-gram will probably be worse for matching unrelated docs...
Ngrams should usually only be used in the index_analyzer, not the
search_analyzer, to improve result relevancy.
For instance:
We index the text "apple" with an edge ngram analyzer, and get:
a,ap,app,appl,apple
When we analyze the search text, we can do it with ngrams or without
ngrams. This is what you would see:
The user wants to search for "application", and starts typing:
no-ngram ngram
--------------------------------------------
a match match
ap match match
app match match
appl match match
appli no match match*
applic no match match*
Those last two results show why you usually don't want to use the ngram
version at search time.
To achieve this, when you specify the mapping for a field, you can do:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.