How to get RSS feed plugin from indexing duplicate posts

I'm currently playing with the RSS plugin for Logstash to index articles from my website. The plugin works good, but I notice if your interval is every 30 seconds or so, Logstash seems to re-index the same posts over and over again.

I guess my question is, how can I get Logstash to only index new articles that have been published to my rss feed rather than indexing the entire rss feed every time it runs through the interval time?

Thank you

Hi Max,

better late then never :slight_smile:

i assume items in your feed contain changing fields. I had a similar issue with a feed, where the updated field changed frequently. You can either fix the feed if the changing field is a bug (and you are in control of the feed) or you can patch the rss plugin for logstash. Take a look here. Even without experience in ruby the plugin is pretty easy to understand. In my case, i just removed the "updated" field and duplicate entries were gone.

best regards,
Yevgeniy

UPDATE:
if this is your only problem with the rss plugin you probably do not need to fork/patch the plugin. you can use the mutate filter in your logstash configuration file to remove the changing field. in my case (changing field is "updated") the configuration looks like this:

filter {
    mutate {
        remove_field => [ "updated" ]
    }   
}