Definition of analyzers with a language?

Jasper_van_Wanrooy_C · May 2, 2011, 8:30am

Hi,

I'm indexing documents in four languages. Each document has a language flag in it. I'm figuring out the best way to construct my analyzer config and keep it maintainable. Right now I have a separate analyzer for each language. However, for some fields I need some additional filters to strip out HTML. How can I best achieve that, without creating 4 new analyzers for each language? Can you add an extra filter on a field on a document (on indexing)?

The analyzer config I use right now is displayed below.

Thanks,
Jasper

index:
analysis:
analyzer:
english_analyzer:
type: snowball
language: English
tokenizer: default_tokenizer

		german_analyzer:
			type: snowball
			language: German2
			tokenizer: default_tokenizer
		
		dutch_analyzer:
			type: snowball
			language: Dutch
			tokenizer: default_tokenizer
		
		french_analyzer:
			type: snowball
			language: French
			tokenizer: default_tokenizer
		
		# Analyzer which is used when indexing the category_path
		path_index_analyzer:
			type: custom
			tokenizer: path_tokenizer
		
		# Analyzer which is used when searched on the category_path
		path_search_analyzer:
			type: keyword
		
	tokenizer:
		default_tokenizer:
			type: standard
		
		# Used for the category path to split each level into seperate tokens.
		path_tokenizer:
			type: path_hierarchy
			delimiter: /

Topic		Replies	Views
Language and HTML analyzer Elasticsearch	4	600	July 5, 2017
Multilingual index options: _analyzer or multiple mappings or? Elasticsearch	2	623	July 6, 2017
_analyse field: which analyzer will be used on search? Elasticsearch	3	340	July 6, 2017
Supporting as many languages as possible Elasticsearch	1	338	July 6, 2017
Adding filter to existing analyzer Elasticsearch	4	903	July 6, 2017

Definition of analyzers with a language?

Related topics