Need help to parse xml file with logstash

Hi All,

I need help for parsing my xml file
My xml file look like this:

 <UserList>
    <User statefile='10'>
       <UserAt atName='NUMBER'>3312348555</UserAt>
       <UserAt atName='ID'>308014204145695</UserAt>
       <UserAt atName='segTo'>X5XXX</UserAt>
       <UserAt  atName='descriptionSegTo'>XXX 10Ci 4C XX XXX</UserAt >
       <UserAt   atName='payMode'>yes</UserAt >
          </User>
   <User statefile='60'>
       <UserAt atName='NUMBER'>3312348555</UserAt>
       <UserAt atName='ID'>308014204145695</UserAt>
       <UserAt atName='segTo'>X5XXX</UserAt>
       <UserAt  atName='descriptionSegTo'>XXX 10Ci 4C XX XXX</UserAt >
       <UserAt   atName='payMode'>yes</UserAt>
       <ServPrest hash='TTGHB' date=''/>
  </User>
 ..... 
 </UserList>

I want my output in this format:

"NUMBER" => 3312348555
 "ID" => 308014204145695
 "segTo" => X5XXX
 "descriptionSegTo" => XXX 10Ci 4C XX XXX
 "payMode" => yes








"NUMBER" => 3312348555
 "ID" => 308014204145695
 "segTo" => X5XXX
 "descriptionSegTo" => XXX 10Ci 4C XX XXX
 "payMode" => yes
 "hash" => TTGHB 
 .....

My logstash configuration:

input {
    file {
       path => "/home/osad/fichierxml"
       sincedb_path => "/dev/null"
       start_position => "beginning"
       codec => multiline {
          pattern => "<User>"
          negate => true
          what => "previous"
      }
    }
 }


filter {
    xml{
       source => "message"
       store_xml => false
      # target => "doc"
       xpath  =>
        [
          "/User/UserAt/@atName='NUMBER'","NUMBER",
          "/User/UserAt/@atName='ID'","ID",
          "/User/UserAt/@atName='segTo'","SEGTO",
  ] 





 }
   }




output {
  elasticsearch {
      hosts => ["http://localhost:9200"]
      index => "test_xml"
    }
    stdout {
     codec => rubydebug
    }
}

But I have any results when I run logstash. it stays stuck on:

 [2019-08-23T09:29:17,519][INFO ][logstash.javapipeline    ] Pipeline started {"pipeline.id"=>"main"}
[2019-08-23T09:29:17,812][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-08-23T09:29:17,910][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-08-23T09:29:19,373][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Thank you in advance !

That does not match any of the lines in your file. You could try removing the trailing >

Hi @Badger
Thanks for your reply

I did what you suggested but I always have the same result
Is my xpath correct ?

If you use

pattern => "<User "

with a trailing space the xpath appears to parse.

    "NUMBER" => [
    [0] "true"
],
     "SEGTO" => [
    [0] "true"
],
        "ID" => [
    [0] "true"
],

Not sure whether that's what you wanted though.

I don't really understand what you means.
Is that what you asked me:

input {
    file {
       path => "/home/osad/fichierxml"
       sincedb_path => "/dev/null"
       start_position => "beginning"
       codec => multiline {
          pattern =>"</User>"
          negate => true
          what => "previous"
      }
    }
 }


filter {
    xml{
       source => "message"
       store_xml => false
      # target => "doc"
       xpath  =>[ "/User/UserAt/@atName='NUMBER'","NUMBER",
                  "/User/UserAt/@atName='ID'","ID",
                  "/User/UserAt/@atName='segTo'","SEGTO"
                 ]
 }

That's not going to work. I suggest you add

output { stdout { codec => rubydebug {} } }

and focus on getting the multiline codec working before you worry about the xpath expressions.

I changed the multine codec to this:

    codec => multiline {
             pattern => "^"
             negate => true
             what => "previous"

But the result is still blocked on:

[2019-08-23T18:25:41,126][INFO ][logstash.javapipeline    ] Pipeline started {"pipeline.id"=>"main"}
[2019-08-23T18:25:41,318][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-08-23T18:25:41,429][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-08-23T18:25:42,722][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

That pattern matches every line, so this will never combine lines. If you change negate => false then it will combine every line in the file into a single event. However, it will never flush it to the pipeline unless you add 'auto_flush_interval => 1'

You could then parse the XML using this

    xml{
        source => "message"
        store_xml => true
        target => "theXML"
        force_array => false
        remove_field => [ "message" ]
    }
    split { field => "[theXML][User]" }

which will get you events that look like this

    "theXML" => {
    "User" => {
           "UserAt" => [
            [0] {
                 "atName" => "NUMBER",
                "content" => "3312348555"
            },
            [1] {
                 "atName" => "ID",
                "content" => "308014204145695"
            },
            [2] {
                 "atName" => "segTo",
                "content" => "X5XXX"
            },
            [3] {
                 "atName" => "descriptionSegTo",
                "content" => "XXX 10Ci 4C XX XXX"
            },
            [4] {
                 "atName" => "payMode",
                "content" => "yes"
            }
        ],
        "statefile" => "60",
        "ServPrest" => {
            "date" => "",
            "hash" => "TTGHB"
        }
    }
},

You might then want to do something like

    ruby {
        code => '
            u = event.get("[theXML][User][UserAt]")
            if u
                h = {}
                u.each_index { |x|
                    h[ u[x]["atName"] ] = u[x]["content"]
                }
                event.set("[theXML][User][UserAt]", h)
            end
        '
    }

which would change that to

    "theXML" => {
    "User" => {
           "UserAt" => {
            "descriptionSegTo" => "XXX 10Ci 4C XX XXX",
                     "payMode" => "yes",
                       "segTo" => "X5XXX",
                          "ID" => "308014204145695",
                      "NUMBER" => "3312348555"
        },
        "statefile" => "60",
        "ServPrest" => {
            "date" => "",
            "hash" => "TTGHB"
        }
    }
},

Hi @Badger
Thanks for your reply
I have tested your solution but I don't have the same result as you do.
I have some error like:

[2019-08-26T10:48:46,039][WARN ][logstash.filters.xml ] Error parsing xml with XmlSimple {:source=>"message", :value=>" ", :exception=>#<REXML::ParseException: No close tag for /User

[2019-08-26T10:48:46,114][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[theXML][User] is of type = NilClass
[2019-08-26T10:48:46,115][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[theXML][User] is of type = NilClass

"tags" => [
[0] "_xmlparsefailure",
[1] "_split_type_failure"

Use

output { stdout { codec => rubydebug } }

and verify that the event produced by the multiline codec is valid XML. If it is not then the xml filter will not parse it, and then the other filters will not work.

I don't have a multiline error in tags but I have the impression that multiline doesn't work.
The result look like this:

   "tags" => [
        [0] "_xmlparsefailure",
        [1] "_split_type_failure"
    ],
          "path" => "/home/osad/fichierxml",
    "@timestamp" => 2019-08-26T13:24:17.174Z,
          "host" => "elk.lab.fr",
      "@version" => "1"
}
{
          "tags" => [
        [0] "_split_type_failure"
    ],
          "user" => {
         "atName" => "NUMBER",
        "content" => "3312348555"
    },
          "path" => "/home/osad/fichierxml",
    "@timestamp" => 2019-08-26T13:24:17.174Z,
          "host" => "elk.lab.fr",
      "@version" => "1"
}
{
          "tags" => [
        [0] "_split_type_failure"
    ],
          "user" => {
         "atName" => "ID",
        "content" => "308014204145695"
    },
          "path" => "/home/osad/fichierxml",
    "@timestamp" => 2019-08-26T13:24:17.175Z,
          "host" => "elk.lab.fr",
      "@version" => "1"
}
{
          "tags" => [
        [0] "_split_type_failure"
    ],
          "user" => {
         "atName" => "segTo",
        "content" => "X5XXX"
    },
          "path" => "/home/osad/fichierxml",
    "@timestamp" => 2019-08-26T13:24:17.176Z,
          "host" => "elk.lab.fr",
      "@version" => "1"
}

If you have multiple events and a _split_type_failure tag then we can be sure that the multiline filter did not combine the entire file into a single event. That's what you need to fix first.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.