Hello everyone,
I am currently learning Elasticsearch and Logstash, and I have a job to do.
I want to parse a Google News RSS Feed (ex : google news feed) and put the data (from every items) in an indice.
The RSS feed looks like this :
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<item>
<title></title>
<link></link>
<guid isPermaLink="false"></guid>
<pubDate></pubDate>
<description></description>
<source></source>
</item>
<item>...</item>
</channel>
</rss>
The thing is I tried to use the RSS input plugin, but some tags (like
This was the config I used :
input {
rss {
url => "https://news.google.com/rss/search?&scoring=n&num=10&q=intitle:Effondrement%22&hl=fr&gl=FR&ceid=FR:fr"
interval => 600
tags => ["Effondrement"]
}
rss {
url => "https://news.google.com/rss/search?&scoring=n&num=10&q=intitle:Foudre%22&hl=fr&gl=FR&ceid=FR:fr"
interval => 600
tags => ["Foudre"]
}
...
}
filter {
fingerprint {
source => "title"
method => "MURMUR3"
target => "fingerprint"
}
mutate {
gsub => [
"message", "<a[^>]*>(.*?)<\/a>", "\1",
"message", " ", ""
]
copy => { "message" => "description" }
remove_field => "message"
remove_field => "event"
}
}
output {
...
}
I then tried to use http_poller but it didn't seem to work properly, I had only two documents with the whole file in one attribute.
This was the config I used :
input {
http_poller {
urls => {
effondrement => "https://news.google.com/rss/search?&scoring=n&num=10&q=intitle:Effondrement%22&hl=fr&gl=FR&ceid=FR:fr"
explosion => "https://news.google.com/rss/search?&scoring=n&num=10&q=intitle:Explosion%22&hl=fr&gl=FR&ceid=FR:fr"
}
request_timeout => 60
schedule => { "every" => "1h" }
codec => multiline {
pattern => "<item>"
negate => "true"
what => "previous"
}
}
}
filter {
xml {
source => "message"
store_xml => false
xpath => [
"item/title/text()", "title"
]
}
}
output {
...
}
I am probably doing it wrong, so if anyone can show me the way to do what I'll want properly, I greatly appreciate it.
Thanks !
PS: I am french so that is the reason why you could find some mistakes in my message