Hi everyone. I'm relatively new to ElasticSearch (and Lucene in general),
but picking it up fairly quickly. It's a lovely piece of software!
I was curious if I could get some advice about mapping "best practices" for
general search. I am searching an index of product names for matching or
similar products. The mapping that I have on the index is fairly basic:
Effectively, I'm doing basic tokenization and then either phrase matching
or matching term nGrams. I am boosting full phrase matches because that
seems to give the best relevance when people know exactly what they are
searching for, while the low-weighted nGrams help for non-match phrases.
Is there anything that I'm doing sub-optimally or something I should add?
Is it silly to have all three nGrams - front, back and middle? This
search is exclusively for products, so there are a lot of strange queries
("3C 2200mAh Nanotech"). Unfortunately, a lot of those small terms ("3C")
are very important so I have to make sure they aren't filtered out.
Thanks! Any help would be greatly appreciated!
-Zach
I think - based on the assumption that the comparerc.com site based on
this, that you should lowercase index and search. No hits for e.g "wheel"
![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
greetings
Runar Myklebust
Enonic AS
An Open Source Company www.enonic.com/download
On Wed, Jun 27, 2012 at 2:21 AM, Zachary Tong zacharyjtong@gmail.comwrote:
Hi everyone. I'm relatively new to Elasticsearch (and Lucene in general),
but picking it up fairly quickly. It's a lovely piece of software!
I was curious if I could get some advice about mapping "best practices"
for general search. I am searching an index of product names for matching
or similar products. The mapping that I have on the index is fairly basic:
ElasticSearch Mapping · GitHub
Effectively, I'm doing basic tokenization and then either phrase matching
or matching term nGrams. I am boosting full phrase matches because that
seems to give the best relevance when people know exactly what they are
searching for, while the low-weighted nGrams help for non-match phrases.
Is there anything that I'm doing sub-optimally or something I should add?
Is it silly to have all three nGrams - front, back and middle? This
search is exclusively for products, so there are a lot of strange queries
("3C 2200mAh Nanotech"). Unfortunately, a lot of those small terms ("3C")
are very important so I have to make sure they aren't filtered out.
Thanks! Any help would be greatly appreciated!
-Zach
Hehe, that is indeed my site. How'd you find it?
In any case, my analyzers are lowercasing. If you look at the mapping
I'm performing [ "standard", "lowercase", "asciifolding" ] on all the
indexed product names.
I'm also retrieving search results for "wheel"...what browser are you
using? It may just be a problem with my javascript, unrelated to
Elasticsearch.
-Zach
On Friday, June 29, 2012 8:35:23 AM UTC-4, Runar Myklebust wrote:
I think - based on the assumption that the comparerc.com site based on
this, that you should lowercase index and search. No hits for e.g "wheel"
![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
greetings
Runar Myklebust
Enonic AS
An Open Source Company www.enonic.com/download
On Wed, Jun 27, 2012 at 2:21 AM, Zachary Tong zacharyjtong@gmail.comwrote:
Hi everyone. I'm relatively new to Elasticsearch (and Lucene in
general), but picking it up fairly quickly. It's a lovely piece of
software!
I was curious if I could get some advice about mapping "best practices"
for general search. I am searching an index of product names for matching
or similar products. The mapping that I have on the index is fairly basic:
ElasticSearch Mapping · GitHub
Effectively, I'm doing basic tokenization and then either phrase matching
or matching term nGrams. I am boosting full phrase matches because that
seems to give the best relevance when people know exactly what they are
searching for, while the low-weighted nGrams help for non-match phrases.
Is there anything that I'm doing sub-optimally or something I should add?
Is it silly to have all three nGrams - front, back and middle? This
search is exclusively for products, so there are a lot of strange queries
("3C 2200mAh Nanotech"). Unfortunately, a lot of those small terms ("3C")
are very important so I have to make sure they aren't filtered out.
Thanks! Any help would be greatly appreciated!
-Zach