I have list of products, where each product will have category, color,
designer and description.
The problem i am trying to solve is search relevancy.
I started by indexing everything as-is, but results where not satisfying -
since some products had words used for category name put into description
or designer names, results were score unfairly and relevancy was low.
Next thing i tried was to create stopwords. So basically any word used as
category name was put into other field stopfilter,
and the same for colors. This greatly improved relevancy. A product with
color/category name in description or designer will not bias search result.
Search result for most of test cases are very good, however one problem
remained: ambiguity.
let's say we products in category carpets, some will have color red and
there is designer named "red johh". When searching
for red carpets, i'd like to score carpets in color red higher then
anything else. However "red john carpets" should find
carpets designed by "red john", and not just anything that's red, john or
carpet. Stopwords filter solution removes red from designer field,
so it can't be found this way. What is the best way to solve this problem?
Jorg is right. First you ring a bell, then you blow a whistle and then you
look for some other tool to ask to become a bell or whistle. In the
meantime, you know nothing about your data or relevancy models. Good luck,
On Saturday, August 24, 2013 7:08:56 AM UTC-4, Maciej Dziardziel wrote:
Hi
I have list of products, where each product will have category, color,
designer and description.
The problem i am trying to solve is search relevancy.
I started by indexing everything as-is, but results where not satisfying -
since some products had words used for category name put into description
or designer names, results were score unfairly and relevancy was low.
Next thing i tried was to create stopwords. So basically any word used as
category name was put into other field stopfilter,
and the same for colors. This greatly improved relevancy. A product with
color/category name in description or designer will not bias search result.
Search result for most of test cases are very good, however one problem
remained: ambiguity.
let's say we products in category carpets, some will have color red and
there is designer named "red johh". When searching
for red carpets, i'd like to score carpets in color red higher then
anything else. However "red john carpets" should find
carpets designed by "red john", and not just anything that's red, john or
carpet. Stopwords filter solution removes red from designer field,
so it can't be found this way. What is the best way to solve this problem?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.