Hi,
I'm indexing documents in four languages. Each document has a language flag in it. I'm figuring out the best way to construct my analyzer config and keep it maintainable. Right now I have a separate analyzer for each language. However, for some fields I need some additional filters to strip out HTML. How can I best achieve that, without creating 4 new analyzers for each language? Can you add an extra filter on a field on a document (on indexing)?
The analyzer config I use right now is displayed below.
Thanks,
Jasper
index:
analysis:
analyzer:
english_analyzer:
type: snowball
language: English
tokenizer: default_tokenizer
german_analyzer:
type: snowball
language: German2
tokenizer: default_tokenizer
dutch_analyzer:
type: snowball
language: Dutch
tokenizer: default_tokenizer
french_analyzer:
type: snowball
language: French
tokenizer: default_tokenizer
# Analyzer which is used when indexing the category_path
path_index_analyzer:
type: custom
tokenizer: path_tokenizer
# Analyzer which is used when searched on the category_path
path_search_analyzer:
type: keyword
tokenizer:
default_tokenizer:
type: standard
# Used for the category path to split each level into seperate tokens.
path_tokenizer:
type: path_hierarchy
delimiter: /