Hi everyone. I'm relatively new to ElasticSearch (and Lucene in general),
but picking it up fairly quickly. It's a lovely piece of software!
I was curious if I could get some advice about mapping "best practices" for
general search. I am searching an index of product names for matching or
similar products. The mapping that I have on the index is fairly basic:
Effectively, I'm doing basic tokenization and then either phrase matching
or matching term nGrams. I am boosting full phrase matches because that
seems to give the best relevance when people know exactly what they are
searching for, while the low-weighted nGrams help for non-match phrases.
Is there anything that I'm doing sub-optimally or something I should add?
Is it silly to have all three nGrams - front, back and middle? This
search is exclusively for products, so there are a lot of strange queries
("3C 2200mAh Nanotech"). Unfortunately, a lot of those small terms ("3C")
are very important so I have to make sure they aren't filtered out.
Thanks! Any help would be greatly appreciated!
-Zach
I think - based on the assumption that the comparerc.com site based on
this, that you should lowercase index and search. No hits for e.g "wheel"
greetings
Runar Myklebust
Enonic AS
An Open Source Company www.enonic.com/download
On Wed, Jun 27, 2012 at 2:21 AM, Zachary Tong zacharyjtong@gmail.comwrote:
Hi everyone. I'm relatively new to Elasticsearch (and Lucene in general),
but picking it up fairly quickly. It's a lovely piece of software!
I was curious if I could get some advice about mapping "best practices"
for general search. I am searching an index of product names for matching
or similar products. The mapping that I have on the index is fairly basic:
ElasticSearch Mapping · GitHub
Effectively, I'm doing basic tokenization and then either phrase matching
or matching term nGrams. I am boosting full phrase matches because that
seems to give the best relevance when people know exactly what they are
searching for, while the low-weighted nGrams help for non-match phrases.
Is there anything that I'm doing sub-optimally or something I should add?
Is it silly to have all three nGrams - front, back and middle? This
search is exclusively for products, so there are a lot of strange queries
("3C 2200mAh Nanotech"). Unfortunately, a lot of those small terms ("3C")
are very important so I have to make sure they aren't filtered out.
Thanks! Any help would be greatly appreciated!
-Zach
Hehe, that is indeed my site. How'd you find it?
In any case, my analyzers are lowercasing. If you look at the mapping
I'm performing [ "standard", "lowercase", "asciifolding" ] on all the
indexed product names.
I'm also retrieving search results for "wheel"...what browser are you
using? It may just be a problem with my javascript, unrelated to
Elasticsearch.
-Zach
On Friday, June 29, 2012 8:35:23 AM UTC-4, Runar Myklebust wrote:
I think - based on the assumption that the comparerc.com site based on
this, that you should lowercase index and search. No hits for e.g "wheel"
greetings
Runar Myklebust
Enonic AS
An Open Source Company www.enonic.com/download
On Wed, Jun 27, 2012 at 2:21 AM, Zachary Tong zacharyjtong@gmail.comwrote:
Hi everyone. I'm relatively new to Elasticsearch (and Lucene in
general), but picking it up fairly quickly. It's a lovely piece of
software!
I was curious if I could get some advice about mapping "best practices"
for general search. I am searching an index of product names for matching
or similar products. The mapping that I have on the index is fairly basic:
ElasticSearch Mapping · GitHub
Effectively, I'm doing basic tokenization and then either phrase matching
or matching term nGrams. I am boosting full phrase matches because that
seems to give the best relevance when people know exactly what they are
searching for, while the low-weighted nGrams help for non-match phrases.
Is there anything that I'm doing sub-optimally or something I should add?
Is it silly to have all three nGrams - front, back and middle? This
search is exclusively for products, so there are a lot of strange queries
("3C 2200mAh Nanotech"). Unfortunately, a lot of those small terms ("3C")
are very important so I have to make sure they aren't filtered out.
Thanks! Any help would be greatly appreciated!
-Zach