Remove HTML tag from rss input logstash plugin

I am using logstash rss input plugin to index rss feeds in elasticsearch, but I get text and html tag whic i just want to get text not html tag.
any one can say me which filter plugin should I use and how to filter it ?

I don't think anyone has written a filter for this.

what about grok ? i am not sure will it work or no ?

I wouldn't recommend grok for this task. The xml filter will let you extract the article text but it'll still contain HTML markup. Perhaps you can use a token filter in Elasticsearch to remove unwanted tokens that stem from HTML?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.