Can language analyzers be configured to use char_filters and token_filters?

Robin_Hughes · May 31, 2012, 2:13pm

Hi

I have documents in many languages containing basic html, that need to be
searched in a case insensitive, ascii-folded manner.

Is it possible to use the standard language analyzers from
http://www.elasticsearch.org/guide/reference/index-modules/analysis/lang-analyzer.html
(in addition to plugins such as the smart chinese and stempel analyzers) in
conjunction with the html_strip char_filter, lowercase and asciifolding
token_filters?

As far as I can tell this isn't possible by config alone, but would love to
be proved wrong.

Thanks,
Robin

Ivan · June 1, 2012, 6:53am

Hi Robin,

You can always re-create the analyzer from scratch using a custom
analyzer. Language analyzers are analyzers with a language specific
stemmer filter. Not hard to do in Elasticsearch.

I have never used a language analyzer, but I would assume it does
lowercase and asciifolding already. At least the former.

Ivan

On Thu, May 31, 2012 at 7:13 AM, Robin Hughes robinhughes@fastmail.fm wrote:

Hi

I have documents in many languages containing basic html, that need to be
searched in a case insensitive, ascii-folded manner.

Is it possible to use the standard language analyzers from
Elasticsearch Platform — Find real-time answers at scale | Elastic
(in addition to plugins such as the smart chinese and stempel analyzers) in
conjunction with the html_strip char_filter, lowercase and asciifolding
token_filters?

As far as I can tell this isn't possible by config alone, but would love to
be proved wrong.

Thanks,
Robin

Robin_Hughes · June 1, 2012, 3:34pm

Thanks for your help.

That certainly covers a lot of languages. It looks like some (Polish, Smart
Chinese, Thai) will need a bit of extra work.

Thanks again,

Robin.

Topic		Replies	Views
Adding char_filter into language analyzer Elasticsearch	1	384	March 24, 2020
Accent insensitive search with search analyzer Elasticsearch	8	12063	January 30, 2018
Question about asciifolding filter Elasticsearch	3	549	July 6, 2017
Using a char_filter in combination with a lowercase filter Elasticsearch	4	2034	July 6, 2017
Custom analyzer don't match with ASCII folding filter values Elasticsearch	1	556	December 17, 2019

Can language analyzers be configured to use char_filters and token_filters?

Related topics