I have an analysis chain like this for some Spanish text:
standard asciifolding lowercase es_stop_filter es_stem_filter es_synonyms
With synonyms at the end, after all the other filters, I have to define my
synonyms in their stemmed, ASCII-folded, lowercase forms. So instead of
defining a synonym set like "vacuna, vacunación, inmunización", I have to
define it as "vacun, vacunacion, inmunizacion".
In the case of a very aggressive stemmer like Snowball for English, we
would have to define "intern, global" as a synonym mapping when we'd really
want to write "international, global".
This is a little counter-intuitive for the folks who define our synonyms,
as they think in dictionary terms and not stemmed tokens, and need to have
access to a "standard asciifolding lowercase es_stop_filter es_stem_filter"
analysis chain to apply everything but the synonym filter in order to see
what tokens to specify in the synonyms file.
In this blog post
Solr, the author mentions that one could define a "custom tokenizer that
returns the stemmed form of words from the synonyms file" to get around
this. Is it possible to configure Elasticsearch this way?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7009182-9577-4580-872a-1b121be3457d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.