Remove HTML tag from rss input logstash plugin


(jamal ) #1

I am using logstash rss input plugin to index rss feeds in elasticsearch, but I get text and html tag whic i just want to get text not html tag.
any one can say me which filter plugin should I use and how to filter it ?


(Magnus Bäck) #2

I don't think anyone has written a filter for this.


(jamal ) #3

what about grok ? i am not sure will it work or no ?


(Magnus Bäck) #4

I wouldn't recommend grok for this task. The xml filter will let you extract the article text but it'll still contain HTML markup. Perhaps you can use a token filter in Elasticsearch to remove unwanted tokens that stem from HTML?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.