Variable fields in Logstash's Grok-Pattern

Dears ,

kindly i need some help for the below issue -->

-In the RAW data file that i am using in filebeat , has variable # of fields , and i need to build the Grok Pattern in logstash.

raw data -->
001:2019-04-17 00:00:00 003:201090743559 009:9116435510269278 010: 014:Vodafone 015:85 020:MT 036: 049: 053:367105977,367105978,367105979

001:2019-04-17 00:00:00 003:201090743559 009:9116435510269278 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999

001:2019-04-17 00:00:00 003:201012616667 009:3589600580689119 010: 014:Vodafone 015:85 020:MT 036: 049: 053:367105986,367105987,367105988

-so if you notice , you will find the fields are arranged in this manner , 001,002,003,004...............................................053,
-so what i need to do in grok pattern is -->
001:%{DATA:x} 002:%{DATA:y} 003:%{DATA:z} ..........................................
but please note that these fields are optional and vary from one record to another , and here is the problem , in which how can i add them optionally in the grok pattern ?

-i tried something like this as a trial but it was rejected by grok pattern -->
001:%{DATA:x} (002:%{DATA:y})? (003:%{DATA:z})? ..........................................

regards ,
Nabil

You might be better off trying to use a kv filter

kv { field_split_pattern => " 0" value_split => ":" whitespace => strict }
3 Likes

first of all ,

thanks as usual for your help , but please check the below as i have some point -->

output -->
"hits" : {
"total" : 30,
"hits" : [
{
"001" : "2019-04-18",
"source" : "/var/log/logstash-tutorial.log",
"09" : "3528700523825626 ",
"0" : "00:01",
"message" : "001:2019-04-18 00:00:01 003:201006773798 009:3528700523825626 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999",
"03" : "201006773798 ",
"22" : "1 ",
"32" : "triggered_SMS by trigger id 99999999",
"12" : "RemoteNotify ",
"@timestamp" : "2019-04-18T07:48:01.168Z"
}
- now we have separated as per your solution the fields but we have some points:
first : if we see the message above we could see that "001:2019-04-18 00:00:01"
is one field for "001" = "2019-04-18 00:00:01" .
second: this field "001" how can i identify it as a "date" to be used instead of elastic timestamp.
third: why index have 30 hits , despite that the raw data file has only 10 records ?!

finally , again your solution solve the issue by 80 % thanks

regards ,
Nabil

You can remove and parse the timestamp using

    dissect { mapping => { "message" => "001:%{ts} %{+ts} %{}" } }
    date { match => [ "ts", "YYYY-MM-dd HH:mm:ss" ] }
    mutate { gsub => [ "message", "001:[^ ]+ [^ ]+ ", "" ] }
    kv { field_split_pattern => " 0" value_split => ":" whitespace => strict }
1 Like

Thanks dear your reply solve the issue ,

but i have some questions to understand these filter gathered together -->
dissect { mapping => { "message" => "001:%{ts} %{+ts} %{}" } }
*here i think we collect the filed of "001" to "ts" in twp parts ts & +ts , and the we left the remaining of the message.
mutate { gsub => ; [ "message", "001:[^ ]+ [^ ]+ ", "" ] }
*here this part i can not understand , it will be great if you just clarify the point here ?

thanks ,

The regexp matches 001: followed by a group of characters that are not space, followed by a space, followed by a group of characters that are not space, followed by a space. i.e., it removes the part of the message what was consumed by dissect.

perfect !!!

but why you use that part "" at the end after "001:[^ ]+ [^ ]+ " ?

thanks ,

To replace whatever matches that pattern (i.e. "001:2019-04-18 00:00:01 " with an empty string). That is, it removes that from the start of the string.

thanks it is clear now :smiley:

i re-run them again but there is an failure in dissect as below if there is help for that -->

ARN ] 2019-04-21 18:11:46.225 [[main]>worker0] Dissector - Dissector mapping, pattern not found {"field"=>"message", "pattern"=>"001:%{ts} %{+ts} %{}", "event"=>{"log"=>{"file"=>{"path"=>"/var/log/logstash-tutorial.log"}}, "source"=>"/var/log/logstash-tutorial.log", "prospector"=>{"type"=>"log"}, "@timestamp"=>2019-04-21T17:53:49.009Z, "host"=>{"name"=>"nabilmohamed2c.mylabserver.com"}, "input"=>{"type"=>"log"}, "message"=>"001:2019-04-21 00:00:01\t003:201069393867\t009:3528921046739912\t010:\t014:Vodafone\t015:85\t020:MT\t036:\t049:\t053:369330884,369330885,369330886", "@version"=>"1", "beat"=>{"version"=>"6.7.1", "hostname"=>"nabilmohamed2c.mylabserver.com", "name"=>"nabilmohamed2c.mylabserver.com"}, "offset"=>264, "tags"=>["beats_input_codec_plain_applied", "_dissectfailure"]}}

  • the error says that the pattern not found !!
    however the raw data like this -->
    001:2019-04-21 00:00:01 003:201006773798 009:3528700523825626 010: 014:Vodafone 015:85 020:MT 036: 049: 053:369330872,369330873,3
    69330875

thanks ,

Your fields are tab separated, not space separated. So you need to adjust the dissect, mutate+gsub, and kv to match that.

1 Like

-Thanks dear this solve the problem , i make the raw file all is a space separated and
not include any tabs.

thanks dear

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.