Adding Synonyms on existing Index


(Greg Brown) #1

Greetings,

I have had a cluster running 0.17.8 which I just just updated to
0.18.5 and (foolishly) at the same time I added a synonym file with
the mapping of "plug-in, plug in, plugin, plug-ins, plug ins,
plugins".

Worked fine on my test server, but when I updated the production
server (so old 0.17.8 index now running 0.18.5 + synonyms) my searches
for plugin stopped returning any results even though there had been
results for each of the terms before. My synonym filter is using the
default configuration which I think means expand == true and so any of
the above terms should be rewritten to all of the terms.

By reindexing I'm able to fix the problem (not surprisingly). But it
would be nice to be able to apply synonyms without reindexing.
Especially if/when the index gets so big that it would take weeks to
completely reindex. Any suggestions on what I might have done wrong or
ways to avoid reindexing in the future?

My Analyzer configuration is:
index :
analysis :
filter :
light_stemmer :
type : stemmer
name : minimal_english
snowball :
type : snowball
language : English
wd_filter :
type : word_delimiter
generate_word_parts : true
generate_number_parts : true
catenate_words : true
split_on_case_change : true
preserve_original : true
split_on_numerics: true
ac_synonym :
type : synonym
synonyms_path : ac-synonyms.txt
analyzer :
default :
type : custom
tokenizer : uax_url_email
filter : [wd_filter,lowercase,light_stemmer,ac_synonym]
char_filter : [html_strip]

Thanks
-Greg


(ppearcy) #2

You could apply the synonym filter to the search analyzer only.
Otherwise, I'm pretty sure you need to re-index.

On Dec 1, 7:31 pm, Greg Brown gbrown5...@gmail.com wrote:

Greetings,

I have had a cluster running 0.17.8 which I just just updated to
0.18.5 and (foolishly) at the same time I added a synonym file with
the mapping of "plug-in, plug in, plugin, plug-ins, plug ins,
plugins".

Worked fine on my test server, but when I updated the production
server (so old 0.17.8 index now running 0.18.5 + synonyms) my searches
for plugin stopped returning any results even though there had been
results for each of the terms before. My synonym filter is using the
default configuration which I think means expand == true and so any of
the above terms should be rewritten to all of the terms.

By reindexing I'm able to fix the problem (not surprisingly). But it
would be nice to be able to apply synonyms without reindexing.
Especially if/when the index gets so big that it would take weeks to
completely reindex. Any suggestions on what I might have done wrong or
ways to avoid reindexing in the future?

My Analyzer configuration is:
index :
analysis :
filter :
light_stemmer :
type : stemmer
name : minimal_english
snowball :
type : snowball
language : English
wd_filter :
type : word_delimiter
generate_word_parts : true
generate_number_parts : true
catenate_words : true
split_on_case_change : true
preserve_original : true
split_on_numerics: true
ac_synonym :
type : synonym
synonyms_path : ac-synonyms.txt
analyzer :
default :
type : custom
tokenizer : uax_url_email
filter : [wd_filter,lowercase,light_stemmer,ac_synonym]
char_filter : [html_strip]

Thanks
-Greg


(Shay Banon) #3

You can update the synonym file itself, and then close and open the index,
it will re-read the synonym file.

On Fri, Dec 2, 2011 at 4:31 AM, Greg Brown gbrown5878@gmail.com wrote:

Greetings,

I have had a cluster running 0.17.8 which I just just updated to
0.18.5 and (foolishly) at the same time I added a synonym file with
the mapping of "plug-in, plug in, plugin, plug-ins, plug ins,
plugins".

Worked fine on my test server, but when I updated the production
server (so old 0.17.8 index now running 0.18.5 + synonyms) my searches
for plugin stopped returning any results even though there had been
results for each of the terms before. My synonym filter is using the
default configuration which I think means expand == true and so any of
the above terms should be rewritten to all of the terms.

By reindexing I'm able to fix the problem (not surprisingly). But it
would be nice to be able to apply synonyms without reindexing.
Especially if/when the index gets so big that it would take weeks to
completely reindex. Any suggestions on what I might have done wrong or
ways to avoid reindexing in the future?

My Analyzer configuration is:
index :
analysis :
filter :
light_stemmer :
type : stemmer
name : minimal_english
snowball :
type : snowball
language : English
wd_filter :
type : word_delimiter
generate_word_parts : true
generate_number_parts : true
catenate_words : true
split_on_case_change : true
preserve_original : true
split_on_numerics: true
ac_synonym :
type : synonym
synonyms_path : ac-synonyms.txt
analyzer :
default :
type : custom
tokenizer : uax_url_email
filter : [wd_filter,lowercase,light_stemmer,ac_synonym]
char_filter : [html_strip]

Thanks
-Greg


(system) #4