Trailing Whitespaces in Message Field

Here is my sample data:

EventQ      00001350  Fri 08/05 20:19:20.541  _00008_  39765 990 HEARTBEAT SYSTEM=>MDETMGR i"SINGLETON" //beep// added to EventQueue for a new total of   1 Pending Event

This is one line and Elastic is identifying the end of each line properly. There is funky spacing in each line.

I have used the grok creator tools out there and can get each individual line working by using the \s* or %{SPACE}* pattern but once I put it into my .conf file and let the logs rip through it I just get grok parse failures. What am I doing wrong?

This was my latest sad attempt...

"%{WORD:event}%{SPACE}*%{WORD:code}%{SPACE}*%{DAY:day}%{SPACE}*%{MONTHNUM:month}/%{MONTHDAY:day}%{SPACE}*%{TIME:time}%{SPACE}*%{GREEDYDATA:msg}"

When I do

"%{WORD:event}%{SPACE}*%{GREEDYDATA:msg}"

it gives me:

event: E
msg: ventQ 00001350 Fri 08/05 20:19:20.541 00008 39765 990 HEARTBEAT SYSTEM=>MDETMGR i"SINGLETON" //beep// added to EventQueue for a new total of 1 Pending Event

Hello @Micah_Barsness

Could you try these below GROK pattern and let us know how it goes

%{WORD:event}\s*%{WORD:code}\s*%{DAY:day}\s*%{MONTHNUM:month}/%{MONTHDAY:day}\s*%{TIME:time}\s*%{GREEDYDATA:msg}
1 Like

SPACE is actually \s*, no need for additional asterisk. Have a look here for patterns.

Log fields are usually separated by space. \s* is 0 or more whitespaces, \s+ is one or more. Depend on case you use * or + .

The \s metacharacter matches whitespace character.
Whitespace characters can be:

  • A space character
  • A tab character \t
  • A carriage return character \r
  • A new line character \n
  • A vertical tab character \v
  • A form feed character \f
1 Like

When I use this it doesn't work, and it should - I can put it into the grok debugger and its working just fine. It makes me think there is something else going on... Here's my input /filter

input {
  beats {
   port => 5047
        }
      }

filter {
  grok {
  match => { "message" => [
"%{WORD:event}\s*%{WORD:code}\s*%{DAY:day}\s*%{MONTHNUM:month}/%{MONTHDAY:day}\s*%{TIME:time}\s*%{GREEDYDATA:msg}"
                          ]
           }
       }
       }

Try with

filter {

	grok {
		match => { "message" =>  "%{WORD:event}\s+%{WORD:code}\s+%{EXCHTIME:[@metadata][timestamp]}\s*%{DATA:code2}\s+%{NUMBER:code3}\s+%{NUMBER:code4}\s+%{GREEDYDATA:msg}" }
		pattern_definitions => { "EXCHTIME" => "%{DAY:day}\s*%{MONTHNUM:month}/%{MONTHDAY:daynum}\s*%{TIME:hours}" } 
	}

    date {
    match => ["[@metadata][timestamp]", "E MM/dd HH:mm:ss.SSS"]
    timezone => "Asia/Dubai"
	remove_field => [ "day", "daynum","month", "hours"]
   }
   
}

Result:

{
         "code2" => "_00008_",
          "code" => "00001350",
         "code4" => "990",
    "@timestamp" => 2022-08-05T16:19:20.541Z,
           "msg" => "HEARTBEAT SYSTEM=>MDETMGR i\\\"SINGLETON\\\" //beep// added to EventQueue for a new total of   1 Pending Event",
       "message" => "EventQ      00001350  Fri 08/05 20:19:20.541  _00008_  39765 990 HEARTBEAT SYSTEM=>MDETMGR i\\\"SINGLETON\\\" //beep// added to EventQueue for a new total of   1 Pending Event",
         "code3" => "39765",
      "@version" => "1"
}
1 Like

I think I've figured out the issue, need assistance with the solution:

image

There are trailing whitespaces in each event.

image

image

I did some research and saw that NOTSPACE might work better for this but haven't had any luck yet.

{NOTSPACE:event}%{NOTSPACE:code}%{DAY:day}%{MONTHNUM:month}/%{MONTHDAY:day}%{TIME:time}%{GREEDYDATA:msg}

Any suggestions on how to take care of these hidden characters?

Interesting..

A simple,

"%{NOTSPACE:event}%{GREEDYDATA:msg}"

produces the event with no trailing characters

image

but the characters are left on the "msg" field

To remove trailing or leading whitespaces in Logstash you can use the strip option of the mutate filter.

    filter {
      mutate {
         strip => ["field1", "field2"]
      }
    }

This will remove any leading or trailing whitespaces from the fields specified.

The only problem is that I'm unable to grok to get those fields out. I get grokparsefailure every time I run

%{NOTSPACE:event}%{NOTSPACE:code}%{DAY:day}%{MONTHNUM:month}/%{MONTHDAY:day}%{TIME:time}%{GREEDYDATA:msg}

I believe its getting tripped up on these whitespaces during the grok.

Also tried:

{NOTSPACE:event}\S+%{NOTSPACE:code}\S+%{DAY:day}\S+%{MONTHNUM:month}/%{MONTHDAY:day}\S+%{TIME:time}\S+%{GREEDYDATA:msg}

no luck :frowning:

You should use \s+ or %{SPACE}, not \S+.

This works in Grok Debugger on Kibana.

%{NOTSPACE:event}\s+%{NOTSPACE:code}\s+%{DAY:day}\s+%{MONTHNUM:month}/%{MONTHDAY:day}\s+%{TIME:time}\s+%{GREEDYDATA:msg}

The response is:

{
  "msg": "_00008_  39765 990 HEARTBEAT SYSTEM=>MDETMGR i\"SINGLETON\" //beep// added to EventQueue for a new total of   1 Pending Event",
  "code": "00001350",
  "month": "08",
  "time": "20:19:20.541",
  "event": "EventQ",
  "day": "05"
}

I understand, I've got it working in grok debuggers but grok is not processing my whitespaces properly.

Here's a view of the raw logs -

My current code (has to be simplified otherwise I just get grok failures and its impossible to troubleshoot so im going one field at a time trying to figure this out):

grok {
 match => { "message" => [
    "%{NOTSPACE:event}%{GREEDYDATA:msg}"
    ]
           } }

grok { match => { "msg" =>  "%{GREEDYDATA:code}%{GREEDYDATA:msg2}" }}            
       }

I can't even get this to work:

input {
  beats {
   port => 5047
        }
      }

filter {

mutate { 
    gsub => [
      # replace all whitespace characters or multiple adjacent whitespace characters with one space 
      "message", "\s+", " "
    ]
  }

}

I'm not sure your gsub will do what you want to do, it will match one or more spaces an replace each match with another whitespace, so if you have 3 whitespaces, it will match each one of them and replace by another whitespace, in the end you will have the same 3 whitespaces.

Are you using this grok:

%{NOTSPACE:event}\s+%{NOTSPACE:code}\s+%{DAY:day}\s+%{MONTHNUM:month}/%{MONTHDAY:day}\s+%{TIME:time}\s+%{GREEDYDATA:msg}

This should work.

If you want to go field by field just change where you start using GREEDYDATA.

For example:

Start with:

%{NOTSPACE:event}\s+%{%{GREEDYDATA:msg}

If you cant the event and msg field, go to the next field:

%{NOTSPACE:event}\s+%{NOTSPACE:code}\s+%{GREEDYDATA:msg}

And proceed like this until you parse everything.

From what you shared I see no reason for this grok to not work.

Can you share a sample of your messages as plain text so people can try to replicate your pipeline?

Events      00001350  Mon 08/08 05:56:08.779  <Tx>  [Ready] SysHeartbeat 000    7725 990 HEARTBEAT SYSTEM=>OFFMGR i"SINGLETON" //beep// NOW   9 of 13 258 Offlines _00004_ OFFMGR Ready
OfflinesH   00001350  Mon 08/08 05:56:08.779  _00004_ OFFMGR  RUN001 SleepWalk DispatchEvent
Worker18    000013B4  Mon 08/08 05:56:08.779  [W18] _00010_ SECURITYMGR  Security Ready  worker.processmessage BEGINS 001 @1337
Offlines    00001350  Mon 08/08 05:56:08.779  _00004_ OFFMGR  RUN001 4443  Posting SLEEPWALK
EventQ      000013B4  Mon 08/08 05:56:08.779  _00010_   7724 990 HEARTBEAT SYSTEM=>SECURITYMGR i"SINGLETON" //beep// removed from EventQueue leaving a new total of  No Pending Events
EventQ      00001350  Mon 08/08 05:56:08.779  _00007_   7726 990 HEARTBEAT SYSTEM=>MTOKMGR i"SINGLETON" //beep// added to EventQueue for a new total of   1 Pending Event
Security    000013B4  Mon 08/08 05:56:08.779  _00010_ SECURITYMGR  RUN001 4975  <Rx>  [Ready] SysHeartbeat    7724 990 HEARTBEAT SYSTEM=>SECURITYMGR i"SINGLETON" //beep//
Events      00001350  Mon 08/08 05:56:08.779  <Tx>  [Ready] SysHeartbeat 000    7726 990 HEARTBEAT SYSTEM=>MTOKMGR i"SINGLETON" //beep// NOW   8 of 13 258 MulTokAdd _00007_ MTOKMGR Ready
MulTokAddH  00001350  Mon 08/08 05:56:08.779  _00007_ MTOKMGR  RUN001 SleepWalk DispatchEvent
Worker18    000013B4  Mon 08/08 05:56:08.779  [W18] _00010_ SECURITYMGR  Security Ready  worker.processmessage   ENDS 000 (+ 1) @1337
MulTokAdd   00001350  Mon 08/08 05:56:08.779  _00007_ MTOKMGR  RUN001 4442  Posting SLEEPWALK
Worker16    000013AC  Mon 08/08 05:56:08.779  [W16] _00004_ OFFMGR  Offlines Ready  worker.processmessage BEGINS 001 @1337
Worker18    000013B4  Mon 08/08 05:56:08.779  [W18] _00007_ MTOKMGR  MulTokAdd Ready  worker.processmessage BEGINS 001 @1337
EventQ      000013AC  Mon 08/08 05:56:08.779  _00004_   7725 990 HEARTBEAT SYSTEM=>OFFMGR i"SINGLETON" //beep// removed from EventQueue leaving a new total of  No Pending Events
EventQ      00001350  Mon 08/08 05:56:08.779  _00008_   7727 990 HEARTBEAT SYSTEM=>MDETMGR i"SINGLETON" //beep// added to EventQueue for a new total of   1 Pending Event
EventQ      000013B4  Mon 08/08 05:56:08.779  _00007_   7726 990 HEARTBEAT SYSTEM=>MTOKMGR i"SINGLETON" //beep// removed from EventQueue leaving a new total of  No Pending Events
Events      00001350  Mon 08/08 05:56:08.779  <Tx>  [Ready] SysHeartbeat 000    7727 990 HEARTBEAT SYSTEM=>MDETMGR i"SINGLETON" //beep// NOW   7 of 13 258 MulTokLkup _00008_ MDETMGR Ready
MulTokAdd   000013B4  Mon 08/08 05:56:08.779  _00007_ MTOKMGR  RUN001 4443  <Rx>  [Ready] SysHeartbeat    7726 990 HEARTBEAT SYSTEM=>MTOKMGR i"SINGLETON" //beep//
Offlines    000013AC  Mon 08/08 05:56:08.779  _00004_ OFFMGR  RUN001 4444  <Rx>  [Ready] SysHeartbeat    7725 990 HEARTBEAT SYSTEM=>OFFMGR i"SINGLETON" //beep//
Worker18    000013B4  Mon 08/08 05:56:08.779  [W18] _00007_ MTOKMGR  MulTokAdd Ready  worker.processmessage   ENDS 000 (+ 1) @1337
MulTokLkup  00001350  Mon 08/08 05:56:08.779  _00008_ MDETMGR  RUN001 SleepWalk DispatchEvent
Worker16    000013AC  Mon 08/08 05:56:08.779  [W16] _00004_ OFFMGR  Offlines Ready  worker.processmessage   ENDS 000 (+ 1) @1337
MulTokLkup  00001350  Mon 08/08 05:56:08.779  _00008_ MDETMGR  RUN001 4443  Posting SLEEPWALK

these are stored in .trc files which i'm using filebeat to send to logstash. if you double click on "events" at the top left for example its like the spaces are tied to the word Events

I can confirm that it does not. I just get grok parse failure.

As soon as I insert the \s+ the grok fails.

Here is the code you are suggesting:

input {
  beats {
   port => 5047
        }


      }

filter {
grok {
  match => { "message" => [ "%{NOTSPACE:event}\s+%{GREEDYDATA:msg}" ] }
     }
       }

here are the results:

and here is the next phase of the test:

input {
  beats {
   port => 5047
        }


      }

filter {
grok {
  match => { "message" => [ "%{NOTSPACE:event}\s+%{NOTSPACE:code}\s+%{GREEDYDATA:msg}" ] }
     }
       }

and the result:

image

image

It's like its not seeing these spaces as spaces...

I can't replicate, the grok works fine for me:

Using this grok:

filter {
    grok {
        match => {
            "message" => "%{NOTSPACE:event}\s+%{NOTSPACE:code}\s+%{DAY:day}\s+%{MONTHNUM:month}/%{MONTHDAY:day}\s+%{TIME:time}\s+%{GREEDYDATA:msg}"
        }
    }
}

These are some samples of the output from Logstash:

{
          "code" => "00001350",
    "@timestamp" => 2022-08-08T16:27:28.359Z,
           "day" => [
        [0] "Mon",
        [1] "08"
    ],
         "event" => "Events",
      "@version" => "1",
          "host" => "elk-lab",
         "month" => "08",
       "message" => "Events      00001350  Mon 08/08 05:56:08.779  <Tx>  [Ready] SysHeartbeat 000    7725 990 HEARTBEAT SYSTEM=>OFFMGR i\"SINGLETON\" //beep// NOW   9 of 13 258 Offlines _00004_ OFFMGR Ready",
          "time" => "05:56:08.779",
           "msg" => "<Tx>  [Ready] SysHeartbeat 000    7725 990 HEARTBEAT SYSTEM=>OFFMGR i\"SINGLETON\" //beep// NOW   9 of 13 258 Offlines _00004_ OFFMGR Ready"
}
{
          "code" => "00001350",
    "@timestamp" => 2022-08-08T16:27:28.437Z,
           "day" => [
        [0] "Mon",
        [1] "08"
    ],
         "event" => "OfflinesH",
      "@version" => "1",
          "host" => "elk-lab",
         "month" => "08",
       "message" => "OfflinesH   00001350  Mon 08/08 05:56:08.779  _00004_ OFFMGR  RUN001 SleepWalk DispatchEvent",
          "time" => "05:56:08.779",
           "msg" => "_00004_ OFFMGR  RUN001 SleepWalk DispatchEvent"
}
{
          "code" => "000013B4",
    "@timestamp" => 2022-08-08T16:27:28.438Z,
           "day" => [
        [0] "Mon",
        [1] "08"
    ],
         "event" => "Worker18",
      "@version" => "1",
          "host" => "elk-lab",
         "month" => "08",
       "message" => "Worker18    000013B4  Mon 08/08 05:56:08.779  [W18] _00010_ SECURITYMGR  Security Ready  worker.processmessage BEGINS 001 @1337",
          "time" => "05:56:08.779",
           "msg" => "[W18] _00010_ SECURITYMGR  Security Ready  worker.processmessage BEGINS 001 @1337"
}