Grok out a field withing another field

Hello,

I have the following message:

 [Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]

I want to extract the whole message as monitor_name

 [Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]

Then i want to also extract the remote_host: rwbdfrmmop03 from monitor_name

I tired something like this, but it did not work:

(?<monitor_name>(?<os>)(?<monitor_type>)(?<remote_host>[a-z0-9])%{GREEDYDATA})

Thanks for the help.

You may be better working in two phases: first, extract the monitor_name (which I am reading to be a sequence of strings, each bound by square brackets), and then, in a separate filter, extract the relevant bits out of the monitor_name (I would prefer [dissect][], since it is a lot simpler to understand and faster at extracting when the patterns of the separators is known):

filter {
  grok {
    "pattern_definitions" => {
        "MONITOR_NAME" => "(\[[^]]+\])+"
    }
    "match" => {
      "message" => "%{MONITOR_NAME:monitor_name}"
    }
  }
  dissect {
    mapping => {
      "monitor_name" => "[%{os}][%{monitor_type}][%{remote_host}][%{}][%{}][%{}]"
    }
  }
}

I defined the pattern for MONITOR_NAME as (\[[^]]+\])+, because it matches any sequence of square-bracketed things:

(  # open group
 \[  # literal open bracket
   [  # character class
    ^]  # excluding a close bracket
      ]+  #closes character class, allows repetition
        \]  # literal close bracket
          )+  # close group, allows repetition

Ideally, when you grok, you'll be anchoring your pattern to the beginning of your log line (by prefixing it with the ^ character), so your grok pattern will capture from the beginning of your log message. This hugely improves performance because the regular expression doesn't have to try the match again starting with the second character, and again with the third, and so-on until it finds a match.

@yaauie,

Thanks for the information. I see that it is extracting the monitor_name as follows:

Baseline, [Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]

AS you can see it is extracting the Baseline and setting it as monitor_name as well. Where in fact the monitor name should only be:

[Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]

This is the config i have:

	    if ([source] == "my_source"){
		 dissect {
		  mapping => {
                 "message" => "[%{}][%{monitor_name}][%{remote_host}][%{}][%{}][%{}]"
		    }
	             }         
                grok {
                        patterns_dir => [ "/apps/elk/configurations/grok-patterns/my_app" ]
                        match => [ "message", "%{my_match}" ]
                }
                date {
                        match => ["event_timestamp" ,"M/dd/yyyy HH:mm:ss a"]
                        target => "@timestamp"
                        timezone => "America/New_York"
                }
	    }

I attempted to change monitor_name to application_name in the dissect fitler, but i do not see the new field being created.

Essentially i would like it to be as follows:

monitor_name: [Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude] (working)
application_name: Baseline (this is not working)
remote_host: rwbdfrmmop03 (working)

Thanks!

your dissect is capturing a monitor_name (the contents of the second set of brackets), which would explain why you have Baseline as a value for monitor_name.

With a file line.log containing only your input message with a trailing newline:

[Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]

And the pipeline configuration I gave originally, with input and output sections added for completeness:

input {
  stdin {}
}
filter {
  grok {
    "pattern_definitions" => {
        "MONITOR_NAME" => "(\[[^]]+\])+"
    }
    "match" => {
      "message" => "%{MONITOR_NAME:monitor_name}"
    }
  }
  dissect {
    mapping => {
      "monitor_name" => "[%{os}][%{monitor_type}][%{remote_host}][%{}][%{}][%{}]"
    }
  }
}
output { stdout { codec => rubydebug } }

I get the following when I execute, which appears to be what you say you want:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/125673-grok-dissect-monitor-name }
╰─○ cat line.log | ~/src/elastic/releases/logstash-6.2.2/bin/logstash -f pipeline.conf
Sending Logstash's logs to /Users/yaauie/src/elastic/releases/logstash-6.2.2/logs which is now configured via log4j2.properties
[2018-04-10T21:57:48,707][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-04-10T21:57:49,168][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.2.2"}
[2018-04-10T21:57:49,521][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-04-10T21:57:52,604][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-04-10T21:57:52,959][INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x135062c run>"}
[2018-04-10T21:57:53,055][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
{
    "monitor_name" => "[Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]",
      "@timestamp" => 2018-04-10T21:57:53.022Z,
            "host" => "castrovel.local",
     "remote_host" => "rwbdfrmmop03",
    "monitor_type" => "Baseline",
         "message" => "[Linux][Baseline][rwbdfrmmop03][Process][srmclient][SiSExclude]",
        "@version" => "1",
              "os" => "Linux"
}
[2018-04-10T21:57:53,808][INFO ][logstash.pipeline        ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x135062c run>"}
[success (20.000s)]

@yaauie,

That worked thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.