Help with synonyms and edge ngram analyzers


(Dan Tam) #1

Hi,

I'm having problems with an analyzer that uses both synonyms and edge
ngrams filters. I opened an issue on github a few days ago:
https://github.com/elasticsearch/elasticsearch/issues/1835 .There is
also a gist to recreate the problem: https://gist.github.com/2287663

When I use an analyzer with edge ngram filter and synonym filter
during index time, for synonyms defined as "word => synonym", "word"
is not indexed at all.

Depending on the order of how the filters are defined, the behavior is
different. If the filter list is ["standard", "lowercase", "ngrams",
"synonym"], "word" would be indexed as "w", "wo", "wor", "synonym". If
the order of "ngrams" and "synonyms" is reversed, the indexed tokens
are: "s", "sy", "syn", ... "synony", "word".

Any help is much appreciated.

Thanks,
Dan


(Shay Banon) #2

What exactly are you trying to do? Have ngrams applied on the synonyms as
well? It probably make sense in this case to reverse teh order, and first
have the synonym filter, and then apply ngram on it.

On Thu, Apr 5, 2012 at 9:32 AM, Dan Tam dantam@gmail.com wrote:

Hi,

I'm having problems with an analyzer that uses both synonyms and edge
ngrams filters. I opened an issue on github a few days ago:
https://github.com/elasticsearch/elasticsearch/issues/1835 .There is
also a gist to recreate the problem: https://gist.github.com/2287663

When I use an analyzer with edge ngram filter and synonym filter
during index time, for synonyms defined as "word => synonym", "word"
is not indexed at all.

Depending on the order of how the filters are defined, the behavior is
different. If the filter list is ["standard", "lowercase", "ngrams",
"synonym"], "word" would be indexed as "w", "wo", "wor", "synonym". If
the order of "ngrams" and "synonyms" is reversed, the indexed tokens
are: "s", "sy", "syn", ... "synony", "word".

Any help is much appreciated.

Thanks,
Dan


(system) #3