Xml with optionnal fields not added in the result array by logstash

Hi :slight_smile:

I'm moving slowly in this new wold of elastic stack.
I'm currently stuck on a parsing issue with my xml file (ip and nickname removed) :

<inspircdstats>
<userlist>
<user>
  <nickname>User1</nickname>
  <ipaddress>2a01a:b:c:d:e:f</ipaddress>
  <metadata>
    <meta name="cloaked_host"/>
    <meta name="ssl_cert">vtrsE No certificate was found.</meta>
  </metadata>
</user>
<user>
  <nickname>User2</nickname>
  <ipaddress>90.255.255.255</ipaddress>
  <metadata>
    <meta name="cloaked_host"/>
  </metadata>
</user>
<user>
  <nickname>User3</nickname>
  <ipaddress>92.255.255.255</ipaddress>
  <metadata>
   <meta name="cloaked_host"/>     
   <meta name="ssl_cert">vtrsE No certificate was found.</meta>  
  </metadata>
</user>
</userlist>
</inspircdstats>

The Metadata name "ssl_cert" means that User1 and User3 are connected over TLS.
User2 is not (metadata/meta[@name='ssl_cert'] is empty).


My current results :

{
                "tags" => [
        [0] "multiline",
        [1] "inspircduser"
    ],
                "host" => "elk",
           "timestamp" => 2021-04-12T13:42:12.000Z,
           "IpAddress" => [
        [0] "2a01a:b:c:d:e:f",
        [1] "90.255.255.255",
        [2] "92.255.255.255"
    ],
                "type" => "xml",
            "NickName" => [
        [0] "User1",
        [1] "User2",
        [2] "User3"
    ],
          "@timestamp" => 2021-04-12T13:53:43.973Z,
    "SecureConnection" => [
        [0] "vtrsE No certificate was found.",
        [1] "vtrsE No certificate was found."
    ],
            "@version" => "1",
                "path" => "/tmp/abx_2021-04-12_154212.xml"
}

But, because all arrays don't have the same size (Nickame and IpAddress have 3 values, and SecureConnection only 2)
So it looks like User1 and User2 are on a secure connection which is false.

Obviously, it fails when i'm apply my xml + transpose filter :

xml {
    source => "message"
    store_xml => false
    target => "root"
    suppress_empty => false
    xpath => [
       "/inspircdstats/userlist/user/nickname/text()", "NickName",
       "/inspircdstats/userlist/user/ipaddress/text()", "IpAddress",
       "/inspircdstats/userlist/user/metadata/meta[@name='ssl_cert']/text()", "SecureConnection"
    ]
}


ruby { 
   code => "
      event.set('results', [event.get('NickName'), event.get('IpAddress'), event.get('SecureConnection')].transpose)
    "
}
split { field => "results" }
mutate {
    add_field => {
            "nickname" => "%{[results][0]}"
           "ipaddress" => "%{[results][1]}"
            "secure" => "%{[results][2]}"
    }
}

The goal is to generate one event for each user :

{
			"host" => "elk",
			"timestamp" => 2021-04-12T13:42:12.000Z,
			"@timestamp" => 2021-04-12T13:53:43.973Z,
			"NickName" => "User1"
			"IpAddress" => "2a01a:b:c:d:e:f",
			"SecureConnection" => "vtrsE No certificate was found."
			"@version" => "1",
			"path" => "/tmp/abx_2021-04-12_154212.xml"
}				

{

			"host" => "elk",
			"timestamp" => 2021-04-12T13:42:12.000Z,
			"@timestamp" => 2021-04-12T13:53:43.973Z,
			"NickName" => "User2"
			"IpAddress" => "90.255.255.255",
			"SecureConnection" => "Null"
			"@version" => "1",
			"path" => "/tmp/abx_2021-04-12_154212.xml"
}	

{

			"host" => "elk",
			"timestamp" => 2021-04-12T13:42:12.000Z,
			"@timestamp" => 2021-04-12T13:53:43.973Z,
			"NickName" => "User3"
			"IpAddress" => "92.255.255.255",
			"SecureConnection" => "vtrsE No certificate was found."
			"@version" => "1",
			"path" => "/tmp/abx_2021-04-12_154212.xml"
}	

I tried with the suppress_empty = false option on xml block, same result.

Is there a way to tell xpath to set an empty value on the array if the optionnal field is not found ?

Not that I know of, but that is really an xpath question, not a logstash question.

I would save the XML and use a ruby filter to rearrange things

    xml { source => "message" store_xml => true target => "[@metadata][theXML]" force_array => false remove_field => [ "message" ] }
    ruby {
        code => '
            users = event.get("[@metadata][theXML][userlist][user]")
            userArray = []
            users.each_index { |x|
                user = { "Nickname" => users[x]["nickname"], "IpAddress" => users[x]["ipaddress"] }
                if users[x]["metadata"]["meta"].length == 2
                    user["SecureConnection"] = users[x]["metadata"]["meta"][1]["content"]
                else
                    user["SecureConnection"] = "Null"
                end
                userArray << user
            }
            event.set("users", userArray)
        '
    }
    split { field => "users" }
    # Move fields to top level
    ruby { code => 'event.get("users").each { |k, v| event.set(k, v) }; event.remove("users")' }

will get you

{
        "Nickname" => "User1",
      "@timestamp" => 2021-04-12T17:01:13.585Z,
"SecureConnection" => "vtrsE No certificate was found.",
       "IpAddress" => "2a01a:b:c:d:e:f",
        "@version" => "1",
            "host" => "..."
}
{
        "Nickname" => "User2",
      "@timestamp" => 2021-04-12T17:01:13.585Z,
"SecureConnection" => "Null",
       "IpAddress" => "90.255.255.255",
        "@version" => "1",
            "host" => "..."
}

etc.

Thanks for the quick anwser.

A little addon because the medatadata on the example wasn't representative, and sometimes I have more or less "<meta " entries.
So this test could fail :

 if users[x]["metadata"]["meta"].length == 2

Is it possible to detect the string "ssl_cert" on a meta line ?
I don't really know ruby so I tried things like

if users[x]["metadata"]["meta"].include?(ssl_cert)

but it fails ..

You could try

        code => '
            users = event.get("[@metadata][theXML][userlist][user]")
            userArray = []
            users.each_index { |x|
                user = { "Nickname" => users[x]["nickname"], "IpAddress" => users[x]["ipaddress"] }
                metaArray = users[x]["metadata"]["meta"]
                if ! metaArray.is_a? Array
                    metaArray = [ metaArray ] # Force it to be an array
                end
                metaEntry = metaArray.find { |x| x["name"] == "ssl_cert" } || { "content" => "Null" }
                user["SecureConnection"] = metaEntry["content"]

                userArray << user
            }
            event.set("users", userArray)
        '

That will work even of you get an element like

<metadata>
    <meta name="ssl_cert">vtrsE No certificate was found.</meta>
</metadata>

Thank you again for your quick answer and help ...

I moved it to production with some little adjustement, but I detect another (and i think last) anomaly on my source xml file : some users don't have a metadata at all :frowning:

<user>
  <nickname>ChezMoiQuebec</nickname>
  <uuid>00BAAAAA4</uuid>
  <realhost>BotServ.Fantasya.org</realhost>
  <displayhost>BotServ.Fantasya.org</displayhost>
  <realname>Chez Moi Quebec !</realname>
  <server>services.europnet.org</server>
  <signon>1618132609</signon>
  <age>1618132609</age>
  <modes>BIk</modes>
  <ident>Quebec</ident>
  <ipaddress>0.0.0.0</ipaddress>
  <metadata/>
</user>

I tried some ruby code, but i'm very bad at it ..
I know I have to make a test to check if '[metadata][meta]' exist before this line :

metaArray = users[x]["metadata"]["meta"]

I tried with if users[x]["metadata"]["meta"].exists? but i don't really get how ruby code works ... :frowning:

Yes :muscle:

I find the correct code to check if the array exist with "if users[x]["metadata"].nil?"

code => '
    users = event.get("[@metadata][theXML][userlist][user]")
    userArray = []
    users.each_index { |x|
       user = { "nick" => users[x]["nickname"], "ip" => users[x]["ipaddress"], "server" => users[x]["server"], "uuid" => users[x]["uuid"], "realhost" => users[x]["realhost"] }
if users[x]["metadata"].nil?

else
        metaArray = users[x]["metadata"]["meta"]
        if ! metaArray.is_a? Array
            metaArray = [ metaArray ] # Force it to be an array
        end
        metaEntry = metaArray.find { |x| x["name"] == "ssl_cert" } || { "content" => "Null" }
        user["SecureConnection"] = metaEntry["content"]

        metaEntry = metaArray.find { |x| x["name"] == "cgiirc_gateway" } || { "content" => "Null" }
        user["webircgateway"] = metaEntry["content"]
end
        userArray << user
    }
    event.set("users", userArray)
'

..but (there is always a "but" ...)
When I tried to parse all my xml file (36 544 lines / 1756 users), and I get a java error :

And because i'm as good at java as i'm ruby .. i don't understand it except the title "OutOfMemory" ... It just need more (MORE MORE MORE) memory ?

Ok I was on a stage environnement with -Xms1g and -Xmx1g

Upgrade to 3g/6g and it works great now :slight_smile:

I just have to think on how often I want to parse this xml file, because it's taking a lot of cpu / memory.

Thanks for all !

That would work, or you could

if users[x]["metadata"]
    metaArray = users[x]["metadata"]["meta"]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.