You can always re-create the analyzer from scratch using a custom
analyzer. Language analyzers are analyzers with a language specific
stemmer filter. Not hard to do in Elasticsearch.
I have never used a language analyzer, but I would assume it does
lowercase and asciifolding already. At least the former.
I have documents in many languages containing basic html, that need to be
searched in a case insensitive, ascii-folded manner.
Is it possible to use the standard language analyzers from Elasticsearch Platform — Find real-time answers at scale | Elastic
(in addition to plugins such as the smart chinese and stempel analyzers) in
conjunction with the html_strip char_filter, lowercase and asciifolding
token_filters?
As far as I can tell this isn't possible by config alone, but would love to
be proved wrong.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.