Parsing XML data with Xpath

Hello, I am new to Logstash and the whole Elasticsearch stack. I am trying to make a Logstash config file to ingest Cisco Telemetry data.

This is the structure of the data I am trying to ingest:

[host 10.10.20.30 session-id 35] Delivering to <ncclient.operations.subscribe.EstablishSubscription object at 0x7fb82b2e5780>
Subscription Result : notif-bis:ok
Subscription Id     : 2147483650
-->>
(Default Callback)
Event time      : 2019-09-12 13:13:30.290000+00:00
Subscription Id : 2147483650
Type            : 1
Data            :
<datastore-contents-xml xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-push">
  <memory-statistics xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-memory-oper">
    <memory-statistic>
      <name>Processor</name>
      <total-memory>2238677360</total-memory>
      <used-memory>340449924</used-memory>
      <free-memory>1898227436</free-memory>
      <lowest-usage>1897220640</lowest-usage>
      <highest-usage>1264110388</highest-usage>
    </memory-statistic>
    <memory-statistic>
      <name>lsmpi_io</name>
      <total-memory>3149400</total-memory>
      <used-memory>3148576</used-memory>
      <free-memory>824</free-memory>
      <lowest-usage>824</lowest-usage>
      <highest-usage>412</highest-usage>
    </memory-statistic>
  </memory-statistics>
</datastore-contents-xml>
<<--
-->>
(Default Callback)
Event time      : 2019-09-12 13:13:40.290000+00:00
Subscription Id : 2147483650
Type            : 1
Data            :
<datastore-contents-xml xmlns="urn:ietf:params:xml:ns:yang:ietf-yang-push">
  <memory-statistics xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-memory-oper">
    <memory-statistic>
      <name>Processor</name>
      <total-memory>2238677360</total-memory>
      <used-memory>340449924</used-memory>
      <free-memory>1898227436</free-memory>
      <lowest-usage>1897220640</lowest-usage>
      <highest-usage>1264110388</highest-usage>
    </memory-statistic>
  ...........

I have commented all the problems that I am having with the config, I am open to any suggestions that can improve the config.

This is my current configuration:

input {
    file {
        path => "/home/elastic-stack/logstash-7.3.2/event-data/telemetry.log"
        start_position => "beginning"
        type => "sandbox-out"
        codec => multiline {
            pattern => "^<\?datastore-contents-xml .*\>"
            negate => "true"
            what => "previous"
        }
             
    }
    http { 
        host => "127.0.0.1"
        port => 8080
        type => "sandbox-out"
    }
}
filter {
    grok {
        match => { "message" => "\[%{USER:host_name} %{IP:ip_address} %{USER:session-id} %{NUMBER:session-id-num}\]"}
    }
    grok {
        match => { "message" => "\Subscription Id     \: %{BASE16NUM:subcription-id:int}"}
    }    
    grok {
        match => { "message" => "\Event time      \: %{TIMESTAMP_ISO8601:event-time}"}
    }
    #grok {
    #   match => {"message" => "\<%{USERNAME:Statistic} \xmlns="%{QS:yang-model}\"\>"
    #   I want to accesss the http link with this config but it is not responding
    #   <memory-statistics xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-memory-oper"> 
    #}
    grok {
        match => {"message" => "\<%{USERNAME:Statistic}\>"}
    }
    mutate {
        remove_field => ["headers", "host_name", "session-id","message"]
    }
    date {
        match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
    }
    xml {
        namespaces => {
            "xmlns" => "http://cisco.com/ns/yang/Cisco-IOS-XE-memory-oper"
        }
        #not even the namspace option is working to access the http link
        source => "message"
        target => "xml_content"
        xpath => [
            "/*[name()='datastore-contents-xml']/*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='name']/text()" , "name" ,
            "/*[name()='datastore-contents-xml']/*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='total-memory']/text()" , "total-memory",
            "/*[name()='datastore-contents-xml']/*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='used-memory']/text()" , "used-memory",
            "/*[name()='datastore-contents-xml']/*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='free-memory']/text()" , "free-memory" ,
            "/*[name()='datastore-contents-xml']/*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='lowest-memory']/text()" , "lowest-memory" ,
            "/*[name()='datastore-contents-xml']/*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='highest-memory']/text()" , "highest-memory" 
        ]
        #logstash is not dectecting any of these xpaths in the config  
    }
    mutate {
        convert => {
            "total-memory" => "integer"
            "used-memory" => "integer"
            "free-memory" => "integer"
            "lowest-memory" => "integer"
            "highest-memory" => "integer"
            }
    }
    
    
}
output {
    stdout {
        codec => rubydebug
    }

    file {
        path => "%{type}_%{+dd_MM_yyyy}.log"
    }
}

OUTPUT I am getting:

{
        "ip_address" => "10.10.20.30",
    "subcription-id" => 2147483650,
        "event-time" => "2019-09-12 13:13:30.290000+00:00",
              "host" => "127.0.0.1",
         "Statistic" => "memory-statistic",
              "type" => "sandbox-out",
          "@version" => "1",
        "@timestamp" => 2019-09-26T10:03:00.620Z,
    "session-id-num" => "35"
}

Desired OUTPUT:

{
        "ip_address" => "10.10.20.30",
    "subcription-id" => 2147483650,
        "event-time" => "2019-09-12 13:13:30.290000+00:00",
              "host" => "127.0.0.1",
         "Statistic" => "memory-statistic",
              "type" => "sandbox-out",
          "@version" => "1",
        "@timestamp" => 2019-09-26T10:03:00.620Z,
    "session-id-num" => "35"
        "yang-model" => "http://cisco.com/ns/yang/Cisco-IOS-XE-memory-oper"
              "name" => "Processor"
      "total-memory" => 2238677360
       "used-memory" => 340449924
       "free-memory" => 1898227436
      "lowest-usage" => 1897220640
     "highest-usage" => 1264110388
}

I would appreciate any help really much, as I am a complete noob to Logstash.

With that input and that multiline codec you should not be seeing any events at all, so one of them has to be different to what you are actually using.

I actually removed the multiline plugin and there is no difference in the output. Do you have any suggestion or should I make any changes to multiline codec plugin.

The following multiline codec will combine the XML with the messages before it

    codec => multiline {
        pattern => "^</datastore-contents-xml>"
        negate => "true"
        what => "next"
    }

The next problem is that the xml filter is very forgiving of random text around the XML when using the target option, but completely unforgiving when using xpath. So you need to remove all the junk preceding the XML.

ruby { code => 'event.set("justXml", event.get("message").match(/.+(<datastore-contents-xml.*)/m)[1])' }

(And update your xml filter to source that field.) Thirdly, your xpath expressions are wrong. You could change them to start with //

"//*[name()='memory-statistics']/*[name()='memory-statistic'][1]/*[name()='used-memory']/text()" , "used-memory"

works, but I would go with the simpler

"/datastore-contents-xml/memory-statistics/memory-statistic[1]/name/text()" , "name" ,

or even

"//memory-statistics/memory-statistic[1]/free-memory/text()" , "free-memory"

You might want to set the force_array option on the xml fitler.

Thank for giving such a detailed answer.

I was trying to execute the recommendations that you made but I have problem with one that I quoted, I get the this error in the logstash console:

[2019-09-27T09:18:55,622][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `match' for nil:NilClass
/home/elastic-stack/logstash-7.3.2/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
        "ip_address" => "10.10.20.30",
    "subcription-id" => 2147483650,
    "session-id-num" => "35",
              "tags" => [
        [0] "_rubyexception"
    ],
         "Statistic" => "memory-statistic",
        "event-time" => "2019-09-12 13:13:30.290000+00:00",
              "type" => "sandbox-out",
          "@version" => "1",
              "host" => "127.0.0.1",
        "@timestamp" => 2019-09-27T07:18:54.868Z

I tried to look at the documentation for event API, where I found the event.set() and event.get()
and I can understand that with this command we are trying to filter out the xml only from the entire log.
Any help resolving this error would be appreciated.

I am an idiot solved it, thank you so very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.