Targeting Field Content

Hi,
I am trying to get grok to target certain fields and extract the content that follows. For example:

<c_port>3186</c_port>

I want to find c_port and extract the number that follows, in this case 3186. I have tried number of ways and cant seem to get this functionality. Thank you.

During my log onboarding, I have a similar situation.
I use mutate gsub instead.

So for your situation, we can use a regex to specify what to capture.

    mutate {
          gsub => ["field_name", "[^\d]*(\d+)[^\>]*\>", "\1"]
        }

hope this can help you.

You could do that using grok or dissect. If the entire [message] is XML then you also have the option of using an xml filter.

1 Like

Thank you Kavierkoo.
As Badger suggested, I would prefer to use the XML filter or grok option but I can't seem to find how to make either of those two give me the value (3186) that follows that field (<c_port>). Can I get a hint on how I may incorporate the XML filter or grok to accomplish this?

this is a good and better idea!

Thank you Kavierkoo.
As Badger suggested, I would prefer to use the XML filter or grok option but I can't seem to find how to make either of those two give me the value (3186) that follows that field (<c_port>). Can I get a hint on how I may incorporate the XML filter or grok to accomplish this?

Looks like this could help you.

overwrite edit

  • Value type is array
  • Default value is []

The fields to overwrite.

This allows you to overwrite a value in a field that already exists.

For example, if you have a syslog line in the message field, you can overwrite the message field with part of the match like so:

    filter {
      grok {
        match => { "message" => "%{SYSLOGBASE} %{DATA:message}" }
        overwrite => [ "message" ]
      }
    }

In this case, a line like May 29 16:37:11 sadness logger: hello world will be parsed and hello world will overwrite the original message.

Kavierkoo,
Thank you for your kindness in taking the time to help me out. I apologize for beating the dead horse but what I am looking to do is not overwrite anything. I am trying to find a field title - in this case <c_port> and then extracting the the content that follows - in this case 3186 but ignoring the end closing portion - in this case </c_port>.

This configuration

input { generator { count => 1 lines => [ '<a><c_port>3186</c_port></a>' ] } }
filter {
    xml {
        source => "message"
        store_xml => false
        xpath => { "/a/c_port/text()" => "c_port" }
    }
    mutate { replace => { "c_port" => "%{[c_port][0]}" } }
    mutate { convert => { "c_port" => "integer" } }
}
output  { stdout { codec => rubydebug { metadata => false } } }

produces

    "c_port" => 3186,

Note that [message] must be a complete and valid XML document for this to work.

To give a better example of the filter configuration I would need to see the complete document, and yes, I realize that may not be possible for you to post.

:dizzy_face: Wow Badger, did you use a tool for this or did that come out of your head that fast? That worked immediately (of course) but my input will read from a file such as what follows -which may be wrong for the intended work but still the input will come from a file:

  input {
    file {
      path => "/tmp/test2.xml"
      sincedb_path => "/dev/null"
      start_position => "beginning"
      codec => multiline {
        pattern => "^<name=*\>"
        auto_flush_interval => 1
        negate => "true"
        what => "previous"
        max_lines => 1000000000
        max_bytes => "500 MiB"
  }}

Whereas the line provided works beautifully but I am not sure how to incorporate it so that it does what it does after reading in the file:

input { generator { count => 1 lines => [ '<a><c_port>3186</c_port></a>' ] } }

You would not use that, I was just using it to provide an event where the [message] field was valid XML. Provided your multiline codec consumes a single complete XML document you should be OK.

hummm. when I feed it a file it simply returns:
"c_port" => 0

if I take the <c_port>33186</c_port> portion out of the input file, I still get:
"c_port" => 0

My input file starts with:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:bcLogEntry xmlns:ns2="http://www.yahoo.com/wcf/log/v1_0">
    <alert_id>AS01WCF0000000000000001894</alert_id>
    <c_ip>fee3:e1ad:e1ad:daba::3</c_ip>
    <c_port>3186</c_port>
    <cs_Accept_>*/*</cs_Accept_>
    <cs_Accept__length>3</cs_Accept__length>

Using xpath in logstash with namespaces is beyond my talents. You could try

    xml {
        source => "message"
        store_xml => true
        target => "theXML"
        force_array => false
    }
    mutate { replace => { "c_port" => "%{[theXML][c_port]}" } }
    mutate { convert => { "c_port" => "integer" } }

try xpath something like this

below one is my xpath you could try your one with below config
xpath =>[
"/propertyAvailability/hotelRates/hotel/bookingChannel/ratePlan/@id","ratePlanid" ]
}}

I figured out how to get it done when there is a namespace...

    xml {
        source => "message"
        store_xml => false
        namespaces => { "ns2" => "http://www.yahoo.com/wcf/log/v1_0" }
        xpath => { "/ns2:bcLogEntry/c_port/text()" => "c_port" }
    }
    mutate { replace => { "c_port" => "%{[c_port][0]}" } }
    mutate { convert => { "c_port" => "integer" } }

Thank you Puneeth.

Thank you Badger, this did the trick. Appreciate it very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.