Duplicaties with Logstash rss plugin

Hello,

Currently I am configuring a rss feed with logstash rss plugin.
This is my logstash pipeline config:

input {
        rss {
                url => ""
                interval => 7200
                tags => ["rss"]
  }
}
filter {
        fingerprint {
                source => "title"
  }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "rss-feeds"
    user => ""
    password => ""
  }
  stdout{ codec => rubydebug }
}

To remove duplicates, I have used fingerprint in the filter section.But unfortunately I have still duplicates.
I hope somebody have other suggestions to solve this.

Best regards,
Robin

A fingerprint filter by itself does not eliminate duplicates. It just creates a hash that you can use to eliminate duplicates.

filter {
    fingerprint {
        source => "title"
    }
}
output {
    elasticsearch {
        hosts => ["http://localhost:9200"]
        document_id => "%{fingerprint}"
        index => "rss-feeds"
        user => ""
        password => ""
    }
}

That way a new version of a document with the same title will overwrite any previous version.

Thanks for your response. I will try it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.