How to logstash filter xml

Hi, i have logstash config

input {
    file {
        path => ["/mnt/data/*/*.xml"]
        start_position => beginning
        #sincedb_path => "/opt/logstash/sincedb-xml49"
        sincedb_path => "/dev/null"
        codec => multiline  {
            #pattern => "<?xml " 
            #pattern => "^<NewDataSet.*\>"
            #pattern => "<TransferLogDetailForExport>"
            pattern => "<Vouchers>"
            negate => "true"
            what => "previous"
        }

    }
}

filter {
    xml {
        force_array => "false"
        store_xml => "false"
        source => "message"
        target => "Vouchers"
        #xpath => ["/NewDataSet/TransferLogDetailForExport/LogID/text()", "LogID"]
        #xpath => ["/NewDataSet/TransferLogDetailForExport/LogDateTime/text()", "LogDateTime"]
        xpath => ["/Vouchers/VoucherID/text()", "VoucherID"]
        xpath => ["/Vouchers/VoucherTypeID/text()", "VoucherTypeID"]
        xpath => ["/Vouchers/ComputerID/text()", "ComputerID"]
        xpath => ["/Vouchers/Used/text()", "Used"]
    }
}

output {
    #elasticsearch {
    #    hosts => "159.138.237.176:9200"
    #    index => "cup49"                
    #}
  
    stdout 
    {
        codec => rubydebug
    }

}        

Output data

{
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "2",
"ComputerID" : "0"
},
{
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "73",
"ComputerID" : "0"
},
{
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "6",
"ComputerID" : "0"
},

I would like output

{
"LogID": "15237",
"LogDateTime": "2020-01-07T17:00:47",
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "2",
"ComputerID" : "0"
},
{
"LogID": "15237",
"LogDateTime": "2020-01-07T17:00:47",
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "73",
"ComputerID" : "0"
},
{
"LogID": "15237",
"LogDateTime": "2020-01-07T17:00:47",
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "6",
"ComputerID" : "0"
},

This is my xml data

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
  <TransferLogDetailForExport>
     <LogID>15237</LogID>
     <LogDateTime>2020-01-07T17:00:47</LogDateTime>
     <GroupType>1</GroupType>
     <DataType>4</DataType>
     <FromShopID>53</FromShopID>
     <DestinationShopID>1</DestinationShopID>
     <FileName>001_TEST053_001_20200107_170047</FileName>
     <CriteriaStartTime>2020-01-06T17:00:16</CriteriaStartTime>
     <ISFromLastUpdate>1</ISFromLastUpdate>
     <StaffID>-1</StaffID>
     <UpdateDate>2020-01-07T17:00:47</UpdateDate>
     <RetryTime>0</RetryTime>
     <ResultCode>1</ResultCode>
     <DatabaseName>test_db</DatabaseName>
     <IPAddress>192.168.1.1</IPAddress>
     <ExportType>XML</ExportType>
  </TransferLogDetailForExport>
  <Vouchers>
     <VoucherID>2</VoucherID>
     <VoucherTypeID>24</VoucherTypeID>
     <ComputerID>0</ComputerID>
     <Used>1</Used>
  </Vouchers>
  <Vouchers>
     <VoucherID>3</VoucherID>
     <VoucherTypeID>24</VoucherTypeID>
     <ComputerID>0</ComputerID>
     <Used>1</Used>
   </Vouchers>
   <Vouchers>
     <VoucherID>6</VoucherID>
     <VoucherTypeID>24</VoucherTypeID>
     <ComputerID>0</ComputerID>
     <Used>1</Used>
   </Vouchers>
 <Vouchers>
       ........
  </Vouchers>
</NewDataSet>

Please help.
Thank you.

Hi
Your logid and logdatetime is commented out

It's differrence _doc. please see image attached file

Thank you.

Sorry I’ve misunderstood what you’re trying to do.

It looks like you input the date and time once, then iterate over the vouchers to create an event. However this is splitting your times and vouchers into 2 separate events.

Then under the filters when you parse the xml, you split out vouchers but nothing else.

That’s what I’m getting from that anyways (I could be wrong!)

I would look to get everything as 1 document, make your vouchers an array of values, then use the split filter to separate them into different events (which would keep the date stamps)

I think that’d work :slight_smile:

So sorry i'm not good in English. I want LogID and LogDateTime from tag to

Example output

{
"LogID": "15237",
"LogDateTime": "2020-01-07T17:00:47",
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "2",
"ComputerID" : "0"
},
{
"LogID": "15237",
"LogDateTime": "2020-01-07T17:00:47",
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "73",
"ComputerID" : "0"
},
{
"LogID": "15237",
"LogDateTime": "2020-01-07T17:00:47",
"Used" : "1",
"VoucherTypeID" : "24",
"VoucherID" : "6",
"ComputerID" : "0"
},
{
....
}

Thank you.

With that XML you could use

xml { source => "message" target => "theXML" force_array => false remove_field => [ "message" ] }
mutate {
    add_field => {
        "LogID" => "%{[theXML][TransferLogDetailForExport][LogID]}"
        "LogDateTime" => "%{[theXML][TransferLogDetailForExport][LogDateTime]}"
    }
}
split { field => "[theXML][Vouchers]" }
ruby {
    code => '
        event.get("[theXML][Vouchers]").each { |k, v| event.set(k, v) }
        event.remove("[theXML]")
    '
}

to produce

{
    "VoucherID" => "2",
  "LogDateTime" => "2020-01-07T17:00:47",
   "@timestamp" => 2020-01-20T21:39:34.862Z,
"VoucherTypeID" => "24",
         "Used" => "1",
   "ComputerID" => "0",
        "LogID" => "15237"
}
{
    "VoucherID" => "3",
  "LogDateTime" => "2020-01-07T17:00:47",
   "@timestamp" => 2020-01-20T21:39:34.862Z,
"VoucherTypeID" => "24",
         "Used" => "1",
   "ComputerID" => "0",
        "LogID" => "15237"
}

etc.

I have error log

and output

My logstash config.

    input {
    file {
        path => ["/mnt/data/*/*.xml"]
        start_position => beginning
        #sincedb_path => "/opt/logstash/sincedb-xml49"
        sincedb_path => "/dev/null"
        codec => multiline  {
            #pattern => "<?xml " 
            #pattern => "^<NewDataSet.*\>"
            #pattern => "<TransferLogDetailForExport>"
            pattern => "<Vouchers>"
            negate => "true"
            what => "previous"
        }
    }
}

filter {
    xml {
        source => "message" 
        target => "theXML"
        force_array => false 
        remove_field => [ "message" ]
    }

    mutate {
        add_field => {
                "LogID" => "%{[theXML][TransferLogDetailForExport][LogID]}"
                "LogDateTime" => "%{[theXML][TransferLogDetailForExport][LogDateTime]}"
        }
    }

    split { 
         field => "[theXML][Vouchers]" 
    }

    ruby {
        code => 'event.get("[theXML][Vouchers]").each { |k, v| event.set(k, v) }
        event.remove("[theXML]")'       
    }
    
}

output {
    elasticsearch {
        hosts => "159.138.237.176:9200"
        index => "cup52"                
    }
  
    stdout 
    {
        codec => rubydebug
    }

}

Could you give us some indication in text what those images say?

Log output from logstash docker

logstash    | [2020-01-21T08:44:50,085][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
logstash    | [2020-01-21T08:44:51,072][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"message", :value=>"<?xml version=\"1.0\" standalone=\"yes\"?>\r\n<NewDataSet>\r\n  <TransferLogDetailForExport>\r\n    <LogID>15237</LogID>\r\n    <LogDateTime>2020-01-07T17:00:47</LogDateTime>\r\n    <GroupType>1</GroupType>\r\n    <DataType>4</DataType>\r\n    <FromShopID>53</FromShopID>\r\n    <DestinationShopID>1</DestinationShopID>\r\n    <FileName>001_CUPVCR053_001_20200107_170047</FileName>\r\n    <CriteriaStartTime>2020-01-06T17:00:16</CriteriaStartTime>\r\n    <ISFromLastUpdate>1</ISFromLastUpdate>\r\n    <StaffID>-1</StaffID>\r\n    <UpdateDate>2020-01-07T17:00:47</UpdateDate>\r\n    <RetryTime>0</RetryTime>\r\n    <ResultCode>1</ResultCode>\r\n    <DatabaseName>test</DatabaseName>\r\n    <IPAddress>192.168.1.1</IPAddress>\r\n    <ExportType>XML</ExportType>\r\n  </TransferLogDetailForExport>\r", :exception=>#<REXML::ParseException: No close tag for /NewDataSet
logstash    | Line: 20
logstash    | Position: 754
logstash    | Last 80 unconsumed characters:
logstash    | >, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-xml-4.0.7/lib/logstash/filters/xml.rb:185:in `filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:143:in `do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:162:in `block in multi_filter'", "org/jruby/RubyArray.java:1800:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:115:in `multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:251:in `block in start_workers'"]}
logstash    | [2020-01-21T08:44:51,539][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,541][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,541][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,542][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,542][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,542][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,543][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,545][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,545][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[theXML][Vouchers] is of type = NilClass
logstash    | [2020-01-21T08:44:51,554][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,557][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,564][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,565][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,566][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,567][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,567][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,568][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | [2020-01-21T08:44:51,569][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass
logstash    | /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
logstash    | {
logstash    |            "host" => "5cacf0bc6d71",
logstash    |         "message" => "<?xml version=\"1.0\" standalone=\"yes\"?>\r\n<NewDataSet>\r\n  <TransferLogDetailForExport>\r\n    <LogID>15237</LogID>\r\n    <LogDateTime>2020-01-07T17:00:47</LogDateTime>\r\n    <GroupType>1</GroupType>\r\n    <DataType>4</DataType>\r\n    <FromShopID>53</FromShopID>\r\n    <DestinationShopID>1</DestinationShopID>\r\n    <FileName>001_CUPVCR053_001_20200107_170047</FileName>\r\n    <CriteriaStartTime>2020-01-06T17:00:16</CriteriaStartTime>\r\n    <ISFromLastUpdate>1</ISFromLastUpdate>\r\n    <StaffID>-1</StaffID>\r\n    <UpdateDate>2020-01-07T17:00:47</UpdateDate>\r\n    <RetryTime>0</RetryTime>\r\n    <ResultCode>1</ResultCode>\r\n    <DatabaseName>test</DatabaseName>\r\n    <IPAddress>192.168.1.1</IPAddress>\r\n    <ExportType>XML</ExportType>\r\n  </TransferLogDetailForExport>\r",
logstash    |        "@version" => "1",
logstash    |            "path" => "/mnt/data/001_CUPVCR053_001_20200107_170047/ExportData.xml",
logstash    |      "@timestamp" => 2020-01-21T01:44:49.430Z,
logstash    |           "LogID" => "%{[theXML][TransferLogDetailForExport][LogID]}",
logstash    |     "LogDateTime" => "%{[theXML][TransferLogDetailForExport][LogDateTime]}",
logstash    |            "tags" => [
logstash    |         [0] "multiline",
logstash    |         [1] "_xmlparsefailure",
logstash    |         [2] "_split_type_failure",
logstash    |         [3] "_rubyexception"
logstash    |     ]
logstash    | }

from elasticsearch

  {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 10,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "cup52",
            "_type" : "_doc",
            "_id" : "zsR_xW8BiVWNNH1HT1OF",
            "_score" : 1.0,
            "_source" : {
              "message" : "<?xml version=\"1.0\" standalone=\"yes\"?>\r\n<NewDataSet>\r\n  <TransferLogDetailForExport>\r\n    <LogID>15237</LogID>\r\n    <LogDateTime>2020-01-07T17:00:47</LogDateTime>\r\n    <GroupType>1</GroupType>\r\n    <DataType>4</DataType>\r\n    <FromShopID>53</FromShopID>\r\n    <DestinationShopID>1</DestinationShopID>\r\n    <FileName>001_CUPVCR053_001_20200107_170047</FileName>\r\n    <CriteriaStartTime>2020-01-06T17:00:16</CriteriaStartTime>\r\n    <ISFromLastUpdate>1</ISFromLastUpdate>\r\n    <StaffID>-1</StaffID>\r\n    <UpdateDate>2020-01-07T17:00:47</UpdateDate>\r\n    <RetryTime>0</RetryTime>\r\n    <ResultCode>1</ResultCode>\r\n    <DatabaseName>test</DatabaseName>\r\n    <IPAddress>192.168.1.1</IPAddress>\r\n    <ExportType>XML</ExportType>\r\n  </TransferLogDetailForExport>\r",
              "host" : "5cacf0bc6d71",
              "path" : "/mnt/data/001_CUPVCR053_001_20200107_170047/ExportData.xml",
              "LogID" : "%{[theXML][TransferLogDetailForExport][LogID]}",
              "tags" : [
                "multiline",
                "_xmlparsefailure",
                "_split_type_failure",
                "_rubyexception"
              ],
              "LogDateTime" : "%{[theXML][TransferLogDetailForExport][LogDateTime]}",
              "@timestamp" : "2020-01-21T00:27:29.814Z",
              "@version" : "1"
            }
          },
          {
            "_index" : "cup52",
            "_type" : "_doc",
            "_id" : "z8R_xW8BiVWNNH1HT1OF",
            "_score" : 1.0,
            "_source" : {
              "host" : "5cacf0bc6d71",
              "path" : "/mnt/data/001_CUPVCR053_001_20200107_170047/ExportData.xml",
              "LogID" : "%{[theXML][TransferLogDetailForExport][LogID]}",
              "tags" : [
                "multiline",
                "_split_type_failure",
                "_rubyexception"
              ],
              "LogDateTime" : "%{[theXML][TransferLogDetailForExport][LogDateTime]}",
              "@timestamp" : "2020-01-21T00:27:29.929Z",
              "theXML" : {
                "VoucherTypeID" : "24",
                "VoucherID" : "2",
                "ComputerID" : "0",
                "Used" : "1"
              },
              "@version" : "1"
            }
          },
          {
            "_index" : "cup52",
            "_type" : "_doc",
            "_id" : "0MR_xW8BiVWNNH1HT1OF",
            "_score" : 1.0,
            "_source" : {
              "host" : "5cacf0bc6d71",
              "path" : "/mnt/data/001_CUPVCR053_001_20200107_170047/ExportData.xml",
              "LogID" : "%{[theXML][TransferLogDetailForExport][LogID]}",
              "tags" : [
                "multiline",
                "_split_type_failure",
                "_rubyexception"
              ],
              "LogDateTime" : "%{[theXML][TransferLogDetailForExport][LogDateTime]}",
              "@timestamp" : "2020-01-21T00:27:29.931Z",
              "theXML" : {
                "VoucherTypeID" : "24",
                "VoucherID" : "3",
                "ComputerID" : "0",
                "Used" : "1"
              },
              "@version" : "1"
            }
          },
          {
            "_index" : "cup52",
            "_type" : "_doc",
            "_id" : "0cR_xW8BiVWNNH1HT1OF",
            "_score" : 1.0,
            "_source" : {
              "host" : "5cacf0bc6d71",
              "path" : "/mnt/data/001_CUPVCR053_001_20200107_170047/ExportData.xml",
              "LogID" : "%{[theXML][TransferLogDetailForExport][LogID]}",
              "tags" : [
                "multiline",
                "_split_type_failure",
                "_rubyexception"
              ],
              "LogDateTime" : "%{[theXML][TransferLogDetailForExport][LogDateTime]}",
              "@timestamp" : "2020-01-21T00:27:29.938Z",
              "theXML" : {
                "VoucherTypeID" : "24",
                "VoucherID" : "6",
                "ComputerID" : "0",
                "Used" : "1"
              },
              "@version" : "1"
            }
          },

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.