Using logstash


#1

I am trying to use the logstash documentation to set up a xml filter, but I just cant seem to get it right.

My XML format is pretty straigth forward;


(Jay Greenberg) #2

@fereshteh,

XML parsing should work easily, even with multiline. Consider the following input (xml.log):

<html lang="en" class="sample">
  <head mycompany="http://url.me/profile#">
    <meta charset='utf-8'>
       <mytag mykey='myval'/>
    </meta>
  </head>
</html>

<html lang="en" class="sample">
  <head mycompany="http://url.me/profile#">
    <meta charset='utf-8'>
       <mytag mykey2='myval2'/>
    </meta>
  </head>
</html>

And this Logstash Config:

input
{
        file
        {
                path => "/sample-inputs/xml.log"
                sincedb_path => "/dev/null"
                start_position => "beginning"
        }
}

filter
{
        multiline
        {
                pattern => "^<html"
                negate => true
                what => previous
        }
        xml
        {
                source => [ "message" ]
                target => [ "x" ]
        }
}

output
{
        stdout
        {
                codec => rubydebug
        }
}

Produces:

{
       "message" => "<html lang=\"en\" class=\"sample\">\n  <head mycompany=\"http://url.me/profile#\">\n    <meta charset='utf-8'>\n       <mytag mykey='myval'/> \n    </meta>\n  </head>\n</html>\n",
      "@version" => "1",
    "@timestamp" => "2015-10-19T13:54:45.774Z",
          "tags" => [
        [0] "multiline"
    ],
             "x" => {
         "lang" => "en",
        "class" => "sample",
         "head" => [
            [0] {
                "mycompany" => "http://url.me/profile#",
                     "meta" => [
                    [0] {
                        "charset" => "utf-8",
                          "mytag" => [
                            [0] {
                                "mykey" => "myval"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
{
       "message" => "<html lang=\"en\" class=\"sample\">\n  <head mycompany=\"http://url.me/profile#\">\n    <meta charset='utf-8'>\n       <mytag mykey2='myval2'/> \n    </meta>\n  </head>\n</html>",
      "@version" => "1",
    "@timestamp" => "2015-10-19T13:54:45.777Z",
          "tags" => [
        [0] "multiline"
    ],
             "x" => {
         "lang" => "en",
        "class" => "sample",
         "head" => [
            [0] {
                "mycompany" => "http://url.me/profile#",
                     "meta" => [
                    [0] {
                        "charset" => "utf-8",
                          "mytag" => [
                            [0] {
                                "mykey2" => "myval2"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

(Jay Greenberg) #4

@fereshteh,

You can do that too. If you want to parse out the attributes by XPath, do it like this:

 xml
        {
                store_xml => false
                source => [ "message" ]
                xpath => [
                        "/html/@lang","lang",
                        "/html/@class","class",
                ]
        }

Yields:

{
  "lang" => [
        [0] "en"
    ],
  "class" => [
        [0] "sample"
    ]
}

Note that the attributes come out as arrays, because there can be multiple instances of the node specified by the path. If you like, you could do something like this to convert them to strings.

mutate
{
         join => { "lang" => "," }
         join => { "class" => "," }
}

Yields:

{
         "lang" => "en",
         "class" => "sample"
}

HTH


(Jay Greenberg) #6

The multiline filter requires that the html tag is at the beginning of the line - make sure your input file is exactly as mine is.

Let me know how it that helps!

Jay


(Jay Greenberg) #8

In windows, use stdin input instead, and redirect the input of xml.log:

input {  
 stdin { }  
}
bin\logstash -f test.cfg < xml.log

I have not tested this, but it should work.


(Jay Greenberg) #10

@fereshteh,

In your initial question, the XML was formatted on multiple lines, so we used the multiline filter. Is your expected input always on a single line? Or do you still expect to see events over multiple lines?


(Jay Greenberg) #11

You can use the following to split your single line XML into multiple lines:

 mutate {
                        gsub => ["message", "</html><html", "</html>
<html"]
                }

                split {
                }

                xml {
                        source => ["message"]
                        target => ["x"]
                }

(Jay Greenberg) #13

@fereshteh,

In that case, you need to first extract the XML from the rest of the message using GROK like so:

filter
{
        grok {
                match => { "message" => ".*(?<the_xml><html.*</html>).*" }        
        }
        xml {
                store_xml => false
                source => [ "the_xml" ]
                xpath => [
                        "/html/@lang","lang",
                        "/html/@class","class"
                ]
        }
}

(system) #15