Http_poller only outputs hash?

Below is my pipeline. Trying to pull a list from a site and then parse it out with xml filter.

input {
  http_poller {
    urls => {
      isctop100 => "https://isc.sans.edu/api/topips/records/100?xml"
    }
    schedule => {"every" => "1s"}
    target => "data"
    codec => multiline {
      pattern => "<ipaddress>"
      what => "next"
    }
  }
}
filter {
  xml {
    source => "data"
    store_xml => "false"
    xpath => [
      "ipaddress/rank/text()", "Rank",
      "ipaddress/source/text()", "IPAddress",
      "ipaddress/reports/text()", "Reports",
      "ipaddress/targets/text()", "Targets"
    ]
  }
  if "?xml version" or "topips" in [data] {
    drop { }
  }
}

I get the following error in Logstash logs: XML filter expects a string but received a Hash

I see that the XML filter is getting a hash fed to it and I suspect it's the http_poller input, anybody have a definitive answer?

The target directive for http_poller is a little... weird; I would avoid it, and rely on the codec putting the message in the message field; if you need it to end up in data, you can rename it immediately after the input in a filter:

input {
  http_poller{
    # ... (no target directive)
  }
}
filter {
  mutate {
    rename => { "message" => "data" }
  }
}
filter {
  xml {
    source => "data"
    # ...
  }
}

Without the target directive, an Event is created by the codec; most simple codecs (like multiline) will capture the message and create an Event that looks something like:

{
  "message" => "the parsed message",
  "@timestamp" => Timestamp.current,
  "@metadata` => {
    # ...
  }
}

When the target is set, the codec still creates the above Event, but then http_poller converts the Event to a Hash (which throws out the @metadata), and puts the result in a new event at the target address:

{
  "@timestamp" => Timestamp.current, # won't necessarily match the inner timestamp
  "data" => {
    "message" => "the parsed message",
    "@timestamp" => Timestamp.current
  },
  "@metadata` => {
    # ... _new_ metadata
  }
}
1 Like

Good stuff yaauie, thanks for the info. I BELIEVE I started without the target option and was encountering the same issue. Unfortunately, after requesting information from the link provider, they took the feed down for maintenance, I guess I notified them of an issue they weren't aware of. So right now, I have nothing to test against or the time to go out and find another one, I might be able to later tonight though.

Another question, if the page is only updated say, once a day, but I poll the page every hour, does that mean I will have duplicate entries or does http_poller (or some other input/codec/filter) have the ability to track and process only changed data?

Another question, if the page is only updated say, once a day, but I poll the page every hour, does that mean I will have duplicate entries or does http_poller (or some other input/codec/filter) have the ability to track and process only changed data?

No, but if you save the data in an ES document with a fixed name you'll overwrite the same document again and again.

Interesting, leaves me with two questions.

  1. How do you save the document into ES with a fixed name?

  2. Guess it depends on how you accomplish number 1, but how could you configure it to create a new document at a given interval?

Use the elasticsearch output's document_id option. Not sure I understand why you want to create a new document at fixed intervals (regardless of whether the source has changed), but you could do e.g.

document_id => "someprefix-%{+YYYYMMdd}"

to create a new document once a day.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.