Targeting Field Content

Hi,
I am trying to get grok to target certain fields and extract the content that follows. For example:

<c_port>3186</c_port>

I want to find c_port and extract the number that follows, in this case 3186. I have tried number of ways and cant seem to get this functionality. Thank you.

During my log onboarding, I have a similar situation.
I use mutate gsub instead.

So for your situation, we can use a regex to specify what to capture.

    mutate {
          gsub => ["field_name", "[^\d]*(\d+)[^\>]*\>", "\1"]
        }

hope this can help you.

You could do that using grok or dissect. If the entire [message] is XML then you also have the option of using an xml filter.

Thank you Kavierkoo.
As Badger suggested, I would prefer to use the XML filter or grok option but I can't seem to find how to make either of those two give me the value (3186) that follows that field (<c_port>). Can I get a hint on how I may incorporate the XML filter or grok to accomplish this?

this is a good and better idea!

Thank you Kavierkoo.
As Badger suggested, I would prefer to use the XML filter or grok option but I can't seem to find how to make either of those two give me the value (3186) that follows that field (<c_port>). Can I get a hint on how I may incorporate the XML filter or grok to accomplish this?

Looks like this could help you.

overwrite edit

  • Value type is array
  • Default value is []

The fields to overwrite.

This allows you to overwrite a value in a field that already exists.

For example, if you have a syslog line in the message field, you can overwrite the message field with part of the match like so:

    filter {
      grok {
        match => { "message" => "%{SYSLOGBASE} %{DATA:message}" }
        overwrite => [ "message" ]
      }
    }

In this case, a line like May 29 16:37:11 sadness logger: hello world will be parsed and hello world will overwrite the original message.

Kavierkoo,
Thank you for your kindness in taking the time to help me out. I apologize for beating the dead horse but what I am looking to do is not overwrite anything. I am trying to find a field title - in this case <c_port> and then extracting the the content that follows - in this case 3186 but ignoring the end closing portion - in this case </c_port>.

This configuration

input { generator { count => 1 lines => [ '<a><c_port>3186</c_port></a>' ] } }
filter {
    xml {
        source => "message"
        store_xml => false
        xpath => { "/a/c_port/text()" => "c_port" }
    }
    mutate { replace => { "c_port" => "%{[c_port][0]}" } }
    mutate { convert => { "c_port" => "integer" } }
}
output  { stdout { codec => rubydebug { metadata => false } } }

produces

    "c_port" => 3186,

Note that [message] must be a complete and valid XML document for this to work.

To give a better example of the filter configuration I would need to see the complete document, and yes, I realize that may not be possible for you to post.

:dizzy_face: Wow Badger, did you use a tool for this or did that come out of your head that fast? That worked immediately (of course) but my input will read from a file such as what follows -which may be wrong for the intended work but still the input will come from a file:

  input {
    file {
      path => "/tmp/test2.xml"
      sincedb_path => "/dev/null"
      start_position => "beginning"
      codec => multiline {
        pattern => "^<name=*\>"
        auto_flush_interval => 1
        negate => "true"
        what => "previous"
        max_lines => 1000000000
        max_bytes => "500 MiB"
  }}

Whereas the line provided works beautifully but I am not sure how to incorporate it so that it does what it does after reading in the file:

input { generator { count => 1 lines => [ '<a><c_port>3186</c_port></a>' ] } }

You would not use that, I was just using it to provide an event where the [message] field was valid XML. Provided your multiline codec consumes a single complete XML document you should be OK.

hummm. when I feed it a file it simply returns:
"c_port" => 0

if I take the <c_port>33186</c_port> portion out of the input file, I still get:
"c_port" => 0

My input file starts with:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:bcLogEntry xmlns:ns2="http://www.yahoo.com/wcf/log/v1_0">
    <alert_id>AS01WCF0000000000000001894</alert_id>
    <c_ip>fee3:e1ad:e1ad:daba::3</c_ip>
    <c_port>3186</c_port>
    <cs_Accept_>*/*</cs_Accept_>
    <cs_Accept__length>3</cs_Accept__length>

Using xpath in logstash with namespaces is beyond my talents. You could try

    xml {
        source => "message"
        store_xml => true
        target => "theXML"
        force_array => false
    }
    mutate { replace => { "c_port" => "%{[theXML][c_port]}" } }
    mutate { convert => { "c_port" => "integer" } }

try xpath something like this

below one is my xpath you could try your one with below config
xpath =>[
"/propertyAvailability/hotelRates/hotel/bookingChannel/ratePlan/@id","ratePlanid" ]
}}

I figured out how to get it done when there is a namespace...

    xml {
        source => "message"
        store_xml => false
        namespaces => { "ns2" => "http://www.yahoo.com/wcf/log/v1_0" }
        xpath => { "/ns2:bcLogEntry/c_port/text()" => "c_port" }
    }
    mutate { replace => { "c_port" => "%{[c_port][0]}" } }
    mutate { convert => { "c_port" => "integer" } }

Thank you Puneeth.

Thank you Badger, this did the trick. Appreciate it very much.