Finding fields within square brackets

I'm brand new to this so please bear with me..

I have a log from an application that has inconsistent lines - as in the content of each line is different - so that might be a challenge in itself...but later.
For now I'm using stdin and stout to let me paste text into a command line and have logstash respond with the matches.
An example line of the log is:
2019-01-12 02:59:54.324 Trace [T24ServiceConnector] Sending OFS request [Tx be17fc06-f7c4-414c-b182-e5d41201fdeb]: ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,

So far, my conf file looks like this:
input { stdin { } }
filter
{
grok {
patterns_dir => ["C:\Logstash\patterns"]
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:LogLevel}" }
}

  date {
    match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
	timezone=> "Australia/Sydney"
    target => "@timestamp" } 
}

output
{
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

This has mostly been through googling and trial and error. It took a while to get the timstamp bit to work.

The output so far is

{
"@timestamp" => 2019-01-11T15:59:54.324Z,
"@version" => "1",
"timestamp" => "2019-01-12 02:59:54.324",
"LogLevel" => "Trace",
"message" => "2019-01-12 02:59:54.324 Trace [T24ServiceConnector] Sending OFS request [Tx be17fc06-f7c4-414c-b182-e5d41201fdeb]: ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,\r",
"host" => "SYMV170150"
}

Now I'm trying to grab the next bit [T24ServiceConnector] but cannot work it out.

If I add another 'WORD'

input { stdin { } }
filter
{
grok {
patterns_dir => ["C:\Logstash\patterns"]
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:LogLevel} %{WORD:LogSource}" }
}

date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
timezone=> "Australia/Sydney"
target => "@timestamp" }
}

output
{
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

I just get a parse failure

{
"@version" => "1",
"message" => "2019-01-12 02:59:54.324 Trace [T24ServiceConnector] Sending OFS request [Tx be17fc06-f7c4-414c-b182-e5d41201fdeb]: ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,\r",
"host" => "SYMV170150",
"@timestamp" => 2019-01-23T22:57:14.953Z,
"tags" => [
[0] "_grokparsefailure"
]
}

I think I need to do some custom filter, Which is why I added the patterns.
Patterns file just contains
LOGSOURCE \W\b\w+\b\W
right now

and replacing WORD with LOGSOURCE for the third parameter gives a parse error also

{
"host" => "SYMV170150",
"@timestamp" => 2019-01-23T23:03:41.845Z,
"message" => "2019-01-12 02:59:54.324 Trace [T24ServiceConnector] Sending OFS request [Tx be17fc06-f7c4-414c-b182-e5d41201fdeb]: ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,\r",
"@version" => "1",
"tags" => [
[0] "_grokparsefailure"
]
}

So...I'm looking for pointers on how to grab the bits I want. Ultimately I need the logsource, the entry type, transaction ID and main body as separate entries. I'm hoping that once I get the initial pattern for the square bracket, I can butcher the rest together :slight_smile:

I'm getting there :slight_smile:
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:LogLevel} \s*%{BRACKETEDWORD:LogSource}" }

Where
BRACKETEDWORD \[%{WORD}\]

So it looks like I have to delimit the brackets, but also specify the spaces exist

{
"@timestamp" => 2019-01-11T15:59:54.324Z,
"LogLevel" => "Trace",
"host" => "SYMV170150",
"message" => "2019-01-12 02:59:54.324 Trace [T24ServiceConnector] Sending OFS request [Tx be17fc06-f7c4-414c-b182-e5d41201fdeb]: ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,\r",
"LogSource" => "[T24ServiceConnector]",
"@version" => "1",
"timestamp" => "2019-01-12 02:59:54.324"
}

I would dissect that rather than grok it.

dissect { mapping => { "message" => "%{ts} %{+ts} %{level} [%{source}] %{msg1} [Tx %{txId}]: %{msg2}" } }

If you are going to grok it. Then use two windows. In one run logstash with the -r flag so that it restarts every time you edit the config (this saves a huge amount of time). Instead of a stdin filter, put a couple of messages in a file and use a file input. That way it processes the same few messages every time you edit the filter.

input { file { path => "/home/user/foo.txt" sincedb_path => "/dev/null" start_position => "beginning" } }

In the other edit the config. For a small number of patterns it may be easier to use pattern_definitions

filter {
    grok {
        pattern_definitions => { "LOGSOURCE" => "\W\b\w+\b\W" }
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:LOGLEVEL} \[%{WORD:LOGSOURCE}\]" }
    }
}
1 Like

ooh! I didn't know you could use something else to carve up the line.

That's awesome :smile: Even better, it works on some of the other lines too

{
"@version" => "1",
"MainBody" => "ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,\r",
"LogSource" => "T24ServiceConnector",
"Description" => "Sending OFS request",
"message" => "2019-01-12 02:59:54.324 Trace [T24ServiceConnector] Sending OFS request [Tx be17fc06-f7c4-414c-b182-e5d41201fdeb]: ENQUIRY.SELECT,,SOMEUSER//AU0010001,RB.CARD.APP.HEARTBEAT,\r",
"timestamp" => "2019-01-12 02:59:54.324",
"txId" => "be17fc06-f7c4-414c-b182-e5d41201fdeb",
"@timestamp" => 2019-01-11T15:59:54.324Z,
"host" => "SYMV170150",
"LogLevel" => "Trace "
}

How would I trim the spaces out of the LogLevel entry?

If you are using dissect then the documentation covers how to handle visual alignment.

If you are using grok then typically you would replace ' ' with '%{SPACE}' in the pattern, so that it consumes one or more spaces.

Not sure if I need to start a new thread or not but....

I'm using the dissect command as you suggested and it works well for most lines.
However, some lines lack the colon after the [Tx yyyyy], causing a _dissectfailure

If I remove the colon from the match, then those lines work and the other one fail. Evidently, I need a conditional entry. But I think I probably need the entire expression to be evaluated so it's kind of like;
if [entire dissect] is ok, then execute dissect, else execute the alternative dissect.

Is that possible? Or is there a way to do something like If exist ']:' then dissect 1, else dissect 2?

If the pattern is consistent, then dissect is fast and easy. If the pattern is variable then grok may be a better solution. If the colon after the transaction id then dissect is probably not a good fit.

You might want to post a new question showing the two different log lines you want to match (with either </> or ``` so that we can see them) and the grok pattern you are trying to use. Conditional pattern matching in grok is certainly a thing.

Since I'm back to grok...

Is there a way to strip out the brackets?

ie,

BRACKETEDWORD \[%{WORD}\]

results in an output of:

"LogSource" => "[T24ServiceConnector]",

What I's like, is for it to say
"LogSource" => "T24ServiceConnector",

This will also apply to the Tx ID once I work out how to get passed the 'phrase' (other question). Since I'll want TxID to be cfbd08c6-9fea-4d88-b15b-6d0637418452, not [Tx cfbd08c6-9fea-4d88-b15b-6d0637418452] as I expect it will come out presently

Note that you can use a combination of dissect and grok. Use dissect to chop up the consistently formatted part of the line, then grok the msg2.

dissect { mapping => { "message" => "%{ts} %{+ts} %{level} [%{source}] %{msg1} [Tx %{txId}]%{msg2}" } }

In grok, if you don't want to capture BRACKETEDWORD then why define a pattern for it? Just use

\[%{WORD:fieldname}\]

in your pattern.

Thanks!

I literally only got logstash running on Tuesday with the STDIN/Out so I could see what is happening, and I don't really understand how the matching etc works. I can't even figure out how to write the code across multiple lines so it is easier to read - everytime I try, I start getting errors.

Is "message" a keyword? I tried changing it yesterday to "OriginalMessage" and got errors so put it back.
If I wanted to feed msg2 from your dissect into grok, do I put "msg2" instead of "message" after the match? Is it saying feed this variable into the matching filter to the right?
Does that also mean I could grok twice?

your example for bracketedword works perfectly - I think I'm starting to see how it works :slight_smile:

So now I've got

    grok { 
    patterns_dir => ["C:\Logstash\patterns"]
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:LogLevel} \s*\[%{WORD:LogSource}\] %{DATA:Description} \[Tx %{TRANSACTIONID:TxID}\].? %{GREEDYDATA:BodyDetail}" }
  } 

with only the TransactionID defined

TRANSACTIONID ([a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12})

and the output is what I want :smiley: I just need to check it does work for all cases.

Then I need to work out how to get a beat to read the file and send it logstash, but that can be another thread, as too how to then graph the transaction duration which is why I started looking at this in the first place :smile:
Really appreciate your help and patience

Yes, yes, and yes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.