Error with logstash-input-rss

I was trying the RSS plugin with Logstash, and I'm facing an error as detailed below:

Version:
Logstash 7.7.1
Operating System:
CentOS 7.8.2003
Config File:

    input {

      rss {
        url => "https://www.alittihad.ae/arabi.rss"
        interval => 7200
            tags => ["ar", "rss", "alittihad"]
      }
    }

    filter {
       fingerprint{
         source => "title"
         target => "[@metadata][fingerprint]"
         method => "MURMUR3"
       }
    }

    output {
     elasticsearch {
      action => "index"
      hosts => "localhost"
      workers => 1
      document_id => "%{[@metadata][fingerprint]}"
     }
     stdout {}
    }

Error:
[ERROR][logstash.inputs.rss ][main][28d92d7fa1e60631ff0741642cbb39e90c8c542b6251b01324014030be140f75] Uknown error while parsing the feed {:url=>"https://www.alittihad.ae/arabi.rss", :exception=>#<RSS::MissingAttributeError: attribute <url> is missing in tag <source>>}

If you hit that in a browser then you see source elements such as

<source>
<![CDATA[ ]]>
</source>

The RSS spec requires that a source element have a url attribute, so this is not valid RSS.

Thank you for your response! I am aware that the RSS is faulty. However, I tried with another RSS document (https://www.emaratalyoum.com/1.533091?ot=ot.AjaxPageLayout) which didn't have a tag and it worked fine.

Any chance errors can be bypassed in the plugin?

The input does not have a a way to suppress the error.

What do you think could be a better way of dealing with this? I'd really like to use Logstash in my pipeline.

You might be able to use mutate+gsub to remove the empty source elements.

Thanks for the advice. I'll see what I can do about it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.