I tend to prefer indexing-time synonyms. If you only apply synonyms at
searching time then there will be a bias in the score that will favor the
documents that contain the least frequent synonyms because of their IDF.
For example, if television and TV are synonyms and if "television" is much
more frequent than "TV" in your index, searching for "TV" or "television"
will make documents which contain "TV" appear first in the results.
Yeah, from my experiences, I agree. Index time synonyms while not as
flexible (need to rebuild if it changes) ends up with much better
experience.
Only do the expansion on the index analyzer, not the search analyzer for
the field.
Best Regards,
Paul
On Friday, July 26, 2013 3:30:14 AM UTC-6, Adrien Grand wrote:
Hi,
I tend to prefer indexing-time synonyms. If you only apply synonyms at
searching time then there will be a bias in the score that will favor the
documents that contain the least frequent synonyms because of their IDF.
For example, if television and TV are synonyms and if "television" is much
more frequent than "TV" in your index, searching for "TV" or "television"
will make documents which contain "TV" appear first in the results.
Yeah, from my experiences, I agree. Index time synonyms while not as
flexible (need to rebuild if it changes) ends up with much better
experience.
Only do the expansion on the index analyzer, not the search analyzer for
the field.
Best Regards,
Paul
On Friday, July 26, 2013 3:30:14 AM UTC-6, Adrien Grand wrote:
Hi,
I tend to prefer indexing-time synonyms. If you only apply synonyms at
searching time then there will be a bias in the score that will favor the
documents that contain the least frequent synonyms because of their IDF.
For example, if television and TV are synonyms and if "television" is much
more frequent than "TV" in your index, searching for "TV" or "television"
will make documents which contain "TV" appear first in the results.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.