I have had a cluster running 0.17.8 which I just just updated to
0.18.5 and (foolishly) at the same time I added a synonym file with
the mapping of "plug-in, plug in, plugin, plug-ins, plug ins,
plugins".
Worked fine on my test server, but when I updated the production
server (so old 0.17.8 index now running 0.18.5 + synonyms) my searches
for plugin stopped returning any results even though there had been
results for each of the terms before. My synonym filter is using the
default configuration which I think means expand == true and so any of
the above terms should be rewritten to all of the terms.
By reindexing I'm able to fix the problem (not surprisingly). But it
would be nice to be able to apply synonyms without reindexing.
Especially if/when the index gets so big that it would take weeks to
completely reindex. Any suggestions on what I might have done wrong or
ways to avoid reindexing in the future?
My Analyzer configuration is:
index :
analysis :
filter :
light_stemmer :
type : stemmer
name : minimal_english
snowball :
type : snowball
language : English
wd_filter :
type : word_delimiter
generate_word_parts : true
generate_number_parts : true
catenate_words : true
split_on_case_change : true
preserve_original : true
split_on_numerics: true
ac_synonym :
type : synonym
synonyms_path : ac-synonyms.txt
analyzer :
default :
type : custom
tokenizer : uax_url_email
filter : [wd_filter,lowercase,light_stemmer,ac_synonym]
char_filter : [html_strip]
I have had a cluster running 0.17.8 which I just just updated to
0.18.5 and (foolishly) at the same time I added a synonym file with
the mapping of "plug-in, plug in, plugin, plug-ins, plug ins,
plugins".
Worked fine on my test server, but when I updated the production
server (so old 0.17.8 index now running 0.18.5 + synonyms) my searches
for plugin stopped returning any results even though there had been
results for each of the terms before. My synonym filter is using the
default configuration which I think means expand == true and so any of
the above terms should be rewritten to all of the terms.
By reindexing I'm able to fix the problem (not surprisingly). But it
would be nice to be able to apply synonyms without reindexing.
Especially if/when the index gets so big that it would take weeks to
completely reindex. Any suggestions on what I might have done wrong or
ways to avoid reindexing in the future?
My Analyzer configuration is:
index :
analysis :
filter :
light_stemmer :
type : stemmer
name : minimal_english
snowball :
type : snowball
language : English
wd_filter :
type : word_delimiter
generate_word_parts : true
generate_number_parts : true
catenate_words : true
split_on_case_change : true
preserve_original : true
split_on_numerics: true
ac_synonym :
type : synonym
synonyms_path : ac-synonyms.txt
analyzer :
default :
type : custom
tokenizer : uax_url_email
filter : [wd_filter,lowercase,light_stemmer,ac_synonym]
char_filter : [html_strip]
I have had a cluster running 0.17.8 which I just just updated to
0.18.5 and (foolishly) at the same time I added a synonym file with
the mapping of "plug-in, plug in, plugin, plug-ins, plug ins,
plugins".
Worked fine on my test server, but when I updated the production
server (so old 0.17.8 index now running 0.18.5 + synonyms) my searches
for plugin stopped returning any results even though there had been
results for each of the terms before. My synonym filter is using the
default configuration which I think means expand == true and so any of
the above terms should be rewritten to all of the terms.
By reindexing I'm able to fix the problem (not surprisingly). But it
would be nice to be able to apply synonyms without reindexing.
Especially if/when the index gets so big that it would take weeks to
completely reindex. Any suggestions on what I might have done wrong or
ways to avoid reindexing in the future?
My Analyzer configuration is:
index :
analysis :
filter :
light_stemmer :
type : stemmer
name : minimal_english
snowball :
type : snowball
language : English
wd_filter :
type : word_delimiter
generate_word_parts : true
generate_number_parts : true
catenate_words : true
split_on_case_change : true
preserve_original : true
split_on_numerics: true
ac_synonym :
type : synonym
synonyms_path : ac-synonyms.txt
analyzer :
default :
type : custom
tokenizer : uax_url_email
filter : [wd_filter,lowercase,light_stemmer,ac_synonym]
char_filter : [html_strip]
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.