My thinking may be flawed but here is the problem I am attempting to solve.
I have a synonym filter/analyzer configured like this:
analyzer: {
synonym_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'synonym_filter'],
},
},
filter: {
synonym_filter: {
type: 'synonym',
synonyms: [
'mambo 5, mambo number 5 => mambo5',
],
},
},
///
search_headline: {
type: 'text',
analyzer: "synonym_analyzer"
},
In this example, when I index a document like this:
{ search_headline: "Mambo number 5" }
The result of the analysis will convert this field to:
mambo5
So if I search for "number 5" on this field, it will miss because the original field was modified and the search term is not converted because it does not match the synonym.
My thought is to add an additional field:
search_synonyms: {
type: 'text',
analyzer: 'synonym_analyzer'
},
search_headline: {
type: 'text',
copy_to: "search_synonyms"
},
This way, the original field is not modified and we will apply synonym analysis on the new search_synonyms
field.
My question is, is it possible to "remove" terms that don't match a synonym when indexing into the search_synonyms
field?
For example, if I index this document:
{ search_headline: "My favorite song is Mambo number 5" }
I would like the indexed document to look like this:
{
search_synonyms: "mambo5", // could be an array if multiple synonyms are found
search_headline: "My favorite song is Mambo number 5
}
That way I could search across both fields -and- the search_synonyms
field won't cause so much duplication in the indexed content.
Like I said, my way of thinking may be flawed so hopefully this makes sense.