Logstash rss plugin not storing field

I have successfully setup the logstash-rss-plugin and its working fine except one field is not being stored. I need it since the feed items are updated regularly, and without it I am getting duplicate entries in Elasticsearch. This is how the field is shown when I curl the feed from the command line.

<guid isPermaLink="false">A387658297364928.html</guid>

Can someone help me to add this as a field in elasticsearch? Im guessing it has something to do with my logstash config? BTW I have successfully grok'd the message field and setup the time fields.

Found this two year old git issue, can I get some help :slight_smile:

bueller?

Surely there is someone who is able & willing to help!

Still holding out hope!

bump.

bump

bump again. Not going away!

bump

bump

bump

bump

This is either simple or complex to fix depending on whether we want a more flexible solution i.e. add an extensions that allows for more than the guid to be added to an event.

With a bit of fiddling in Ruby's REPL irb I see this.

feed.items[0].guid
=> #<RSS::Rss::Channel::Item::Guid:0x1338fb5 @parent=nil, @converter=nil, @content="\nhttp://liftoff.msfc.nasa.gov/2003/05/27.html#item571\n", @isPermaLink=nil, @do_validate=true
feed.items[0].guid.content.strip
=> "http://liftoff.msfc.nasa.gov/2003/05/27.html#item571"

Bummer, the guid is an object instance not a primitive value - i suppose because it is defined to have a isPermaLink attribute. The source and enclosure elements attributes too. So this is why its hard to add some sort of generic behaviour driven from the config.

In the rss input code we have:

  def handle_rss_response(queue, item)
    @codec.decode(item.description) do |event|
      event.set("Feed",  @url)
      event.set("published", item.pubDate)
      event.set("title", item.title)
      event.set("link", item.link)
      event.set("author", item.author)
      decorate(event)
      queue << event
    end
  end

If you want a quick fix and because its Ruby you can edit the rss input code in your Logstash installation.
Find the file that contains the string def handle_rss_response under the logstash/vendor folder.
Clue: its probably at logstash/vendor/bundle/jruby/?/gems/logstash-input-rss-?/lib/logstash/inputs/rss.rb
Edit this file.
Add this line before decorate(event)

      event.set("guid", item.guid.content.strip) unless item.guid.nil?

Note If you update the plugin or Logstash you will lose the edit and need to redo it - unless the rss input plugin update contains the fix.

After the edit, the handle_rss_response method should look like this:

  def handle_rss_response(queue, item)
    @codec.decode(item.description) do |event|
      event.set("Feed",  @url)
      event.set("published", item.pubDate)
      event.set("title", item.title)
      event.set("link", item.link)
      event.set("author", item.author)
      event.set("guid", item.guid.content.strip) unless item.guid.nil?
      decorate(event)
      queue << event
    end

Restart Logstash and look for a load error on startup, no error -> all good, error -> paste error here.
Hope this helps.

1 Like

I added this to my config with no errors, I'll let you know if it worked tomorrow when the new index has been created. Thanks!

Working great, thanks!!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.