Problem with xml filter

flalar · March 30, 2017, 5:52pm

We're having issues setting a date pulled from a xml document as @timestamp in the date filter.

input xml looks like this
<Task> <TaskId>ServerTasks-5017</TaskId> <TaskState>Success</TaskState> <Created>2015-12-22T08:20:03</Created> <QueueTime>2015-12-22T08:20:03</QueueTime> <StartTime>2015-12-22T08:20:06</StartTime> <CompletedTime>2015-12-22T08:21:11</CompletedTime> <DurationSeconds>68</DurationSeconds> </Task>

Filer looks like this

filter { xml { source => "message" target => "xml_content" xpath => [ "//Created/text()", "created" ] } date { match => [ "created","yyyy-MM-dd'T'HH:mm:ss"] #match => ["xml_content.Created", "yyyy-MM-dd'T'HH:mm:ss"] } }

As you can see I've both tried to match the content pulled from the xml directly and with the xpath.

Both end up with _dateparsefailure and the @timestamp is set the the logstash time.

Any tips for a novice logstash user on how to resolve this?

magnusbaeck · March 30, 2017, 7:21pm

I'm not XPath-fluent, but shouldn't the expression be //Task/Created/text()? What does the resulting event look like? Use a stdout { codec => rubydebug } output. Also, what's in the logs? When the date filter fails it tells you why in the log.

flalar · March 30, 2017, 8:17pm

timestamp and "created" looks like this

"@timestamp" => 2017-03-30T18:29:09.354Z,
"created" => [
[0] "2015-12-22T08:20:03"
],
"@version" => "1",

Might be looking in the wrong log but cant find anything relating to _dateparsefailure in the log. Even tried to set level to debug.

magnusbaeck · March 30, 2017, 8:21pm

The problem is that created is an array. Disable the xml filter's force_array option.

flalar · March 30, 2017, 9:45pm

Thanks for your help! Much appreciated!

even setting 'force_array' to false gives xpath extracted element 'created' as array. Rest of the content that is not extracted with xpath is now not in array format.

Any tips? Can i reference the first element of the array 'created' somehow?

jordansissel · March 30, 2017, 9:57pm

I can reproduce your behavior:

% bin/logstash -e 'filter { xml { source => "message" xpath => { "//Created/text()" => "foo" } store_xml => false } } '

{
  "@timestamp" => 2017-03-29T06:49:47.122Z,
  "foo"        => [
    [0] "2015-12-22T08:20:03"
  ],
...

I agree this behavior is not what I'd expect

There's an open issue on the logstash-filter-xml project that mentions this issue. Your feedback on this issue is welcome

GrahamHannington · March 31, 2017, 1:47am

Yes.

Until the issue mentioned by @jordansissel is resolved, insert the following mutate filter after the xml filter:

mutate {
  replace => { "created" => "%{created[0]}" }
}

This replaces the array with the value of the first (and, in this case, only) element.

That is, instead of:

"created":["2015-12-22T08:20:03"]

the output contains:

"created":"2015-12-22T08:20:03"

GrahamHannington · March 31, 2017, 2:52am

Regarding the following match setting (commented-out in your example):

match => ["xml_content.Created", "yyyy-MM-dd'T'HH:mm:ss"]

From the Elastic docs topic “Accessing Event Data and Fields in the Configuration”:

The syntax to access a field is [fieldname]. If you are referring to a top-level field, you can omit the and simply use fieldname. To refer to a nested field, you specify the full path to that field: [top-level field][nested field].

Replace xml_content.Created with xml_content[Created]

Works with your sample XML data in Logstash 5.2.1, with—as recommended by @magnusbaeck—force_array => false.

GrahamHannington · March 31, 2017, 3:02am

The expression that you cite and the original expression will both work. However, /Task/Created/text() (note the single leading slash) is better.

From the XPath W3C recommendation:

// is short for /descendant-or-self::node()/.

A more specific path, such as /Task/Created, is typically more efficient (and less likely to select nodes that you didn’t intend to ).

GrahamHannington · March 31, 2017, 6:03am

Try this filter:

filter {
    xml {
        source => "message"
        target => "@metadata[xml_content]"
        force_array => false
    }
    # Copy XML content to first-level fields with all-lowercase names
    ruby {
        code => '
            event.get("@metadata[xml_content]").each do |key, value|
                event.set(key.downcase, value)
            end
        '
    }
    mutate {
         remove_field => ["message", "@metadata"]
         convert => {
           "durationseconds" => "integer"
         }
    }
    date {
        match => ["created", "ISO8601"]
    }
}

Notes:

@magnusbaeck: I thought @metadata wasn’t supposed to get passed through to the output, but it does get included in output to stdin and Elasticsearch. Hence its presence in remove_field. Did I miss a memo? (I’m using Logstash 5.2.1.)
That Ruby code is a workaround for the issue I describe in “Set target of xml filter to root?”
If you want to preserve the original case of the XML element names, remove .downcase from key.downcase

Example Logstash output

In JSON format:

"@timestamp": "2015-12-22T08:20:03.000Z",
"completedtime": "2015-12-22T08:21:11",
"created": "2015-12-22T08:20:03",
"@version": "1",
"host": "58a3fe88f636",
"starttime": "2015-12-22T08:20:06",
"durationseconds": 68,
"taskid": "ServerTasks-5017",
"queuetime": "2015-12-22T08:20:03",
"taskstate": "Success"

More unsolicited tips

If you can—if you are responsible for creating the original XML-format events—consider adding a zone designator to the time stamps. Otherwise, be sure that you understand the repercussions of specifying local times, and how those values might be interpreted.

GrahamHannington · March 31, 2017, 8:27am

I have added my two cents to that issue.

flalar · April 4, 2017, 7:37am

This works perfectly! Thank you all for all tips!

system · May 2, 2017, 7:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dateparse failure When trying to get date from xml file Logstash	8	657	May 14, 2020
Date filter not working Logstash	6	1782	April 29, 2017
Date filter fails to convert String value Logstash	3	1042	October 23, 2017
Date filter issues Logstash	3	485	July 6, 2017
Date filter dateparsefailure Logstash	2	298	February 7, 2020

Problem with xml filter

Example Logstash output

More unsolicited tips

Related topics