I'm indexing documents in four languages. Each document has a language flag in it. I'm figuring out the best way to construct my analyzer config and keep it maintainable. Right now I have a separate analyzer for each language. However, for some fields I need some additional filters to strip out HTML. How can I best achieve that, without creating 4 new analyzers for each language? Can you add an extra filter on a field on a document (on indexing)?
The analyzer config I use right now is displayed below.
german_analyzer: type: snowball language: German2 tokenizer: default_tokenizer dutch_analyzer: type: snowball language: Dutch tokenizer: default_tokenizer french_analyzer: type: snowball language: French tokenizer: default_tokenizer # Analyzer which is used when indexing the category_path path_index_analyzer: type: custom tokenizer: path_tokenizer # Analyzer which is used when searched on the category_path path_search_analyzer: type: keyword tokenizer: default_tokenizer: type: standard # Used for the category path to split each level into seperate tokens. path_tokenizer: type: path_hierarchy delimiter: /