I need more clarity on language analyzer and html filtering. My content sometimes come within html tags, that I need to strip out during indexing. Also, it varies by language. I create mapping for each language and have to use appropriate analyzer. How do I combine these?
For ex. I get English Content with or without HTML tags, I get Spanish Content with or without HTML tags. I need to index only the actual content. I also assume language specific analyzer do consider English tokens by default. Because, my content do contain English sentences though classified to be some other language...
Thanks for help..