How to parse XML?

I'm trying to parse the following xml from an api through xml plugin.

<Journey fpTime="12:33" fpDate="02.11.19" delay="0" e_delay="0" platform="102" targetLoc="abc" dirnr="8001055" prod="S     60#S" dir="Böblingen" administration="800643" depStation="Stuttgart Hbf &#x0028;tief&#x0029;" is_reachable="0" delayReason=" " approxDelay="0"></Journey>
<Journey fpTime="12:33" fpDate="02.11.19" delay="0" e_delay="0" platform="101" targetLoc="Stuttgart Schwabstr." dirnr="8006698" prod="S      5#S" dir="Stuttgart Schwabstr." administration="800643" depStation="Stuttgart Hbf &#x0028;tief&#x0029;" is_reachable="0" delayReason=" " approxDelay="0"></Journey>

My config file looks like this:

input {
  http_poller {
    urls => {
     
      test2 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
         url => "https://reiseauskunft.bahn.de/bin/stboard.exe/dn?rt=1&input=Stuttgart%20Hauptbahnhof&boardType=dep&L=vs_java3&productsFilter=111110111&start=yes&maxJourneys=20"
        headers => {
          Accept => "application/json"
        }
     }
    }
    request_timeout => 20
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    schedule => { cron => "* * * * * UTC"}
	codec => "plain"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}
filter {

  ## interpret the message payload as XML
  xml {
    source => "message"
    target => "parsed"
	store_xml => true 
	force_array => false 
  }

    split {
    field => "[parsed][Journey]"
    add_field => {
      ## generate a unique id for the station # X the sensor time to prevent duplicates
      id                  => "%{[parsed][Journey][fpTime]}-%{[parsed][Journey][fpDate]}-%{[parsed][Journey][dirnr]"
      targetStationName                => "%{[parsed][Journey][targetLoc]}"
	  time			=> "%{[parsed][Journey][fpTime]}"
	  dir => "%{[parsed][Journey][dirnr]}"
      jDate  => "%{[parsed][Journey][fpDate]}"
      e_delay                 => "%{[parsed][Journey][e_delay]}"
      depStation                => "%{[parsed][Journey][depStation]}"
      delayReason             => "%{[parsed][Journey][delayReason]}"
      administration        => "%{[parsed][Journey][administration]}"
	  prod	=> "%{[parsed][Journey][prod]}"
	  platform	=> "%{[parsed][Journey][platform]}"
    }
  }
    mutate {
    ## Convert the numeric fileds to the appropriate data type from strings
    convert => {
      "e_delay"  => "integer"
    }
    ## put the geospatial value in the correct [ longitude, latitude ] format
    add_field => { "fullDate" => [ "%{[date]}", "%{[time]}" ]}
    ## get rid of the extra fields we don't need
    remove_field => [ "message", "parsed", "http_poller_metadata"]
  }
  ## use the embedded Unix timestamp 
 date {
    match => ["fullDate", "UNIX_MS"]
    remove_field => ["jDate"]
  }
}

I want to parse each element in <Journey> seperatley but I get the the error message :exception=>#<REXML::ParseException: missing attribute quote
Line: 1
Position: 6750
Last 80 unconsumed characters:

If you supplied the header 'Accept => "application/json"' I would be surprised if the API returned XML.

If it does return the XML you showed then there will not be a [parsed][Journey] field, so your split filter does nothing. "%{[parsed][Journey][delayReason]}" should be "%{[parsed][delayReason]}" etc.

None of that explains the missing attribute quote error.

I could fix the error now my config looks like

input {
  http_poller {
    urls => {
     
      test2 => {
        method => get
         url => "https://reiseauskunft.bahn.de/bin/stboard.exe/dn?rt=1&input=Stuttgart%20Hauptbahnhof&boardType=dep&L=vs_java3&productsFilter=111110111&start=yes&maxJourneys=20"
     
     }
	   
    }

	
	 codec => multiline
        {
            pattern => "<Journey"
            negate => true
            what => "previous"
			charset => "ISO-8859-1"
			auto_flush_interval => 1
        }
	 
    request_timeout => 20
    schedule => { cron => "* * * * * UTC"}
	
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}
filter {

 xml {
    remove_namespaces => true
    source => "message"
	force_array => false
	target => "msg"
	xpath => ["/Journey/@fpTime", "jTime"]
  }
  
}

When I run Logstash i get the following error. Do you know how to fix this?

exception=>#<REXML::ParseException: #<RuntimeError: attempted adding second root element to document>Line: 1
Position: 7305

I believe that is what you get if you try to parse something like

<item>foo</item><item>bar</item>

An xml filter will only work if the input is something like

<items><item>foo</item><item>bar</item></items>

And which type of filter can I use instead?

Use mutate+gsub to add an element wrapping both of the existing elements.

Ok, and how can I add an element at the beginning and at the end position of the message.
filter {
mutate { gsub => [ "message", "Start of Message", "" ] }
}

Use anchors

mutate { gsub => [ "message", "^", "<a>", "message", "$", "</a>" ] }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.