Definition of analyzers with a language?


(Jasper van Wanrooy - Chatventure) #1

Hi,

I'm indexing documents in four languages. Each document has a language flag in it. I'm figuring out the best way to construct my analyzer config and keep it maintainable. Right now I have a separate analyzer for each language. However, for some fields I need some additional filters to strip out HTML. How can I best achieve that, without creating 4 new analyzers for each language? Can you add an extra filter on a field on a document (on indexing)?

The analyzer config I use right now is displayed below.

Thanks,
Jasper

index:
analysis:
analyzer:
english_analyzer:
type: snowball
language: English
tokenizer: default_tokenizer

		german_analyzer:
			type: snowball
			language: German2
			tokenizer: default_tokenizer
		
		dutch_analyzer:
			type: snowball
			language: Dutch
			tokenizer: default_tokenizer
		
		french_analyzer:
			type: snowball
			language: French
			tokenizer: default_tokenizer
		
		# Analyzer which is used when indexing the category_path
		path_index_analyzer:
			type: custom
			tokenizer: path_tokenizer
		
		# Analyzer which is used when searched on the category_path
		path_search_analyzer:
			type: keyword
		
	tokenizer:
		default_tokenizer:
			type: standard
		
		# Used for the category path to split each level into seperate tokens.
		path_tokenizer:
			type: path_hierarchy
			delimiter: /

(system) #2