Language and HTML analyzer

I need more clarity on language analyzer and html filtering. My content sometimes come within html tags, that I need to strip out during indexing. Also, it varies by language. I create mapping for each language and have to use appropriate analyzer. How do I combine these?

For ex. I get English Content with or without HTML tags, I get Spanish Content with or without HTML tags. I need to index only the actual content. I also assume language specific analyzer do consider English tokens by default. Because, my content do contain English sentences though classified to be some other language...

Thanks for help..

It sounds like you need to do a bit of filtering before hand and send different languages into different indices, with different analysers.

Once they are in (eg) english and spanish language indices you can then just run your analysers.

Thanks Mark.

Should I send different languages to different Indices? I thought of using type - mappings for each language withing same index? Can't I have the analyzers with types?

Also, w.r.t html tags, i thought of using html_strip charfilter. Won't it help.
URL Referred:

I would, it just keeps the logical domains cleaner and lets you play with analysis on a per language basis.