Variable fields in Logstash's Grok-Pattern

Nabil_Mohamed · April 17, 2019, 1:17pm

Dears ,

kindly i need some help for the below issue -->

-In the RAW data file that i am using in filebeat , has variable # of fields , and i need to build the Grok Pattern in logstash.

raw data -->
001:2019-04-17 00:00:00 003:201090743559 009:9116435510269278 010: 014:Vodafone 015:85 020:MT 036: 049: 053:367105977,367105978,367105979

001:2019-04-17 00:00:00 003:201090743559 009:9116435510269278 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999

001:2019-04-17 00:00:00 003:201012616667 009:3589600580689119 010: 014:Vodafone 015:85 020:MT 036: 049: 053:367105986,367105987,367105988

-so if you notice , you will find the fields are arranged in this manner , 001,002,003,004...............................................053,
-so what i need to do in grok pattern is -->
001:%{DATA:x} 002:%{DATA:y} 003:%{DATA:z} ..........................................
but please note that these fields are optional and vary from one record to another , and here is the problem , in which how can i add them optionally in the grok pattern ?

-i tried something like this as a trial but it was rejected by grok pattern -->
001:%{DATA:x} (002:%{DATA:y})? (003:%{DATA:z})? ..........................................

regards ,
Nabil

Badger · April 17, 2019, 1:28pm

You might be better off trying to use a kv filter

kv { field_split_pattern => " 0" value_split => ":" whitespace => strict }

Nabil_Mohamed · April 18, 2019, 8:29am

first of all ,

thanks as usual for your help , but please check the below as i have some point -->

output -->
"hits" : {
"total" : 30,
"hits" : [
{
"001" : "2019-04-18",
"source" : "/var/log/logstash-tutorial.log",
"09" : "3528700523825626 ",
"0" : "00:01",
"message" : "001:2019-04-18 00:00:01 003:201006773798 009:3528700523825626 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999",
"03" : "201006773798 ",
"22" : "1 ",
"32" : "triggered_SMS by trigger id 99999999",
"12" : "RemoteNotify ",
"@timestamp" : "2019-04-18T07:48:01.168Z"
}
- now we have separated as per your solution the fields but we have some points:
first : if we see the message above we could see that "001:2019-04-18 00:00:01"
is one field for "001" = "2019-04-18 00:00:01" .
second: this field "001" how can i identify it as a "date" to be used instead of elastic timestamp.
third: why index have 30 hits , despite that the raw data file has only 10 records ?!

finally , again your solution solve the issue by 80 % thanks

regards ,
Nabil

Badger · April 18, 2019, 1:59pm

You can remove and parse the timestamp using

    dissect { mapping => { "message" => "001:%{ts} %{+ts} %{}" } }
    date { match => [ "ts", "YYYY-MM-dd HH:mm:ss" ] }
    mutate { gsub => [ "message", "001:[^ ]+ [^ ]+ ", "" ] }
    kv { field_split_pattern => " 0" value_split => ":" whitespace => strict }

Nabil_Mohamed · April 21, 2019, 11:35am

Thanks dear your reply solve the issue ,

but i have some questions to understand these filter gathered together -->
dissect { mapping => { "message" => "001:%{ts} %{+ts} %{}" } }
*here i think we collect the filed of "001" to "ts" in twp parts ts & +ts , and the we left the remaining of the message.
mutate { gsub => ; [ "message", "001:[^ ]+ [^ ]+ ", "" ] }
*here this part i can not understand , it will be great if you just clarify the point here ?

thanks ,

Badger · April 21, 2019, 12:47pm

The regexp matches 001: followed by a group of characters that are not space, followed by a space, followed by a group of characters that are not space, followed by a space. i.e., it removes the part of the message what was consumed by dissect.

Nabil_Mohamed · April 21, 2019, 1:59pm

perfect !!!

but why you use that part "" at the end after "001:[^ ]+ [^ ]+ " ?

thanks ,

Badger · April 21, 2019, 2:12pm

To replace whatever matches that pattern (i.e. "001:2019-04-18 00:00:01 " with an empty string). That is, it removes that from the start of the string.

Nabil_Mohamed · April 21, 2019, 6:26pm

thanks it is clear now

i re-run them again but there is an failure in dissect as below if there is help for that -->

ARN ] 2019-04-21 18:11:46.225 [[main]>worker0] Dissector - Dissector mapping, pattern not found {"field"=>"message", "pattern"=>"001:%{ts} %{+ts} %{}", "event"=>{"log"=>{"file"=>{"path"=>"/var/log/logstash-tutorial.log"}}, "source"=>"/var/log/logstash-tutorial.log", "prospector"=>{"type"=>"log"}, "@timestamp"=>2019-04-21T17:53:49.009Z, "host"=>{"name"=>"nabilmohamed2c.mylabserver.com"}, "input"=>{"type"=>"log"}, "message"=>"001:2019-04-21 00:00:01\t003:201069393867\t009:3528921046739912\t010:\t014:Vodafone\t015:85\t020:MT\t036:\t049:\t053:369330884,369330885,369330886", "@version"=>"1", "beat"=>{"version"=>"6.7.1", "hostname"=>"nabilmohamed2c.mylabserver.com", "name"=>"nabilmohamed2c.mylabserver.com"}, "offset"=>264, "tags"=>["beats_input_codec_plain_applied", "_dissectfailure"]}}

the error says that the pattern not found !!
however the raw data like this -->
001:2019-04-21 00:00:01 003:201006773798 009:3528700523825626 010: 014:Vodafone 015:85 020:MT 036: 049: 053:369330872,369330873,3
69330875

thanks ,

Badger · April 21, 2019, 6:51pm

Your fields are tab separated, not space separated. So you need to adjust the dissect, mutate+gsub, and kv to match that.

Nabil_Mohamed · April 23, 2019, 10:14am

-Thanks dear this solve the problem , i make the raw file all is a space separated and
not include any tabs.

thanks dear

system · May 21, 2019, 10:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash: Optional fields in grok Logstash	3	342	January 4, 2023
Help needed in grok Logstash	5	2648	February 6, 2020
Add fields with same format but different value problem with grok logstash Logstash	2	294	March 23, 2022
Conditional processing in txt file Logstash	5	319	July 10, 2022
Help with grok pattern Logstash Logstash	3	259	June 22, 2020

Variable fields in Logstash's Grok-Pattern

Related topics