Can i use file filter for xml docs


(Saket Kumar) #1

input {
file {
path => "D:\folder*.xml"
type => "string"
start_position => "beginning"
}
}

filter {
xml {....
}
}


(Magnus B├Ąck) #2

There is no file filter so I'm assuming you're talking about the file input. Yes, this works but unless you use the multiline filter or codec to join the whole file (i.e. all physical lines) into a single message (one logical line) the xml filter won't work.


(Saket Kumar) #3

I have nested xml like this....

"<?xml version="1.0" encoding="UTF-8"?>

<?Status>200 <?response> <?record1> <?data1>a.. <?data2>a.. <?data3>a.. <?record1> : <?record N>... <?record N> " I wish to parse from <?response> and require to output every <?record.> as new line.

(Saket Kumar) #4

Any suggestion for handling this?


(Rafał Trójniak) #5

Just like 'magnusbaeck' said - this will work, but you have to use multiline filter.

Because 'file' input is using each line as separate event, you have to join the events using multiline plugin, and than use xml plugin to transform XML to separate fields.

Here is a pretty good example of that


(Saket Kumar) #6

Thanks! i didnt understand what 'magnusbaeck' illustrated earlier. By this example it is more clear now.
Do we need to use multiline filter for JSON as well?


(Rafał Trójniak) #7

This is depending on how the message is written. They both (JSON and XML) could be written one per line, or one in many lines.

For example

  • XML in one line :
    <test><some_field>value<some_field></test>
  • XML in multiple lines
    <test> <some_field>value</some_field> </test>

When the single event (Written in JSON or XML - that doesn't matter) i swritten in multiple lines - that generates multiple logstash events - you have to join them to single event using multiline filter.
If whole event is written in single line - it is already single Logstash event - you do not have to do it.
That all is just a consequence of assumption, that single event is single file line.

Regards,


(Saket Kumar) #8

Logstash config: Not parsing xml...

input {
file {
path => "/opt/Log/*.xml"
type => "test"
}
}

filter {
multiline {
pattern => "^\s\s(\s\s(\s\s|</data>))"
what => "previous"
}

xml {
        target => "data"
        source => "message"
        add_field => {
            "testId" => "%{[data][testId]}"
        "loadTime"=> "%{[data][average][0][loadTime]}"
        "TTFB"=> "%{[data][average][0][TTFB]}"
            
        }
    }

}

output {
elasticsearch {
action => "index"
host => "172.27.155.109"
index => "xml1"
workers => 1
}
stdout {}
}

XML Input:

<?xml version="1.0" encoding="UTF-8"?> <?response> <?statusCode>200 <?data> <?testId>150603_VD_9VJ <?average> <?loadTime>3594 <?TTFB>282

(Rafał Trójniak) #9

It looks to me like your multiline configuration is not proper.

Please look into my example where :

  • First line (<?xml...) is dropped
  • XML is joined into the whole <response> block
  • XML data are extracted

Hope that helps

Rules :


Documentation:


(Saket Kumar) #10

input {
file {
path => "/opt/Log/*.xml"
type => "test"
}
}

filter {
if [message] =~ "^<?xml .*" {
drop {}
}
multiline {
pattern => "^</response>"
negate => true
what => "next"
}

xml {
source => "message"
target => "data"
}
}

output {
stdout {}
}

I tried with above config. But No luck. Am I still missing anything.


(Saket Kumar) #11

when i run in debug mode I get:

Flushing {:plugin=><LogStash::Filters::Multiline pattern=>"^<\/response>", what=>"next", source=>"message", stream_identity=>"%{host}.%{path}.%{type}">, :level=>:debug, :file=>"(eval)", :line=>"43", :method=>"initialize"}
_discover_file_glob: /opt/Log/*.xml: glob is: ["/opt/Log/XML_webpaeSu.xml"] {:level=>:debug, :file=>"filewatch/watch.rb", :line=>"132", :method=>"_discover_file"}

without any exception


(Rafał Trójniak) #12

Hmm. I don't know what is wrong here.
Can you push the file through the logstash using no filters, and display the JSON results ? Maybe there is something wrong with the input plugin or the file itself..

Please escape the output properly using formatting, so I can read it easier and less error-prone.


(system) #13