Variable fields in Logstash's Grok-Pattern


(nabil mohamed) #1

Dears ,

kindly i need some help for the below issue -->

-In the RAW data file that i am using in filebeat , has variable # of fields , and i need to build the Grok Pattern in logstash.

raw data -->
001:2019-04-17 00:00:00 003:201090743559 009:9116435510269278 010: 014:Vodafone 015:85 020:MT 036: 049: 053:367105977,367105978,367105979

001:2019-04-17 00:00:00 003:201090743559 009:9116435510269278 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999

001:2019-04-17 00:00:00 003:201012616667 009:3589600580689119 010: 014:Vodafone 015:85 020:MT 036: 049: 053:367105986,367105987,367105988

-so if you notice , you will find the fields are arranged in this manner , 001,002,003,004...............................................053,
-so what i need to do in grok pattern is -->
001:%{DATA:x} 002:%{DATA:y} 003:%{DATA:z} ..........................................
but please note that these fields are optional and vary from one record to another , and here is the problem , in which how can i add them optionally in the grok pattern ?

-i tried something like this as a trial but it was rejected by grok pattern -->
001:%{DATA:x} (002:%{DATA:y})? (003:%{DATA:z})? ..........................................

regards ,
Nabil


#2

You might be better off trying to use a kv filter

kv { field_split_pattern => " 0" value_split => ":" whitespace => strict }

(nabil mohamed) #3

first of all ,

thanks as usual for your help , but please check the below as i have some point -->

output -->
"hits" : {
"total" : 30,
"hits" : [
{
"001" : "2019-04-18",
"source" : "/var/log/logstash-tutorial.log",
"09" : "3528700523825626 ",
"0" : "00:01",
"message" : "001:2019-04-18 00:00:01 003:201006773798 009:3528700523825626 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999",
"03" : "201006773798 ",
"22" : "1 ",
"32" : "triggered_SMS by trigger id 99999999",
"12" : "RemoteNotify ",
"@timestamp" : "2019-04-18T07:48:01.168Z"
}
- now we have separated as per your solution the fields but we have some points:
first : if we see the message above we could see that "001:2019-04-18 00:00:01"
is one field for "001" = "2019-04-18 00:00:01" .
second: this field "001" how can i identify it as a "date" to be used instead of elastic timestamp.
third: why index have 30 hits , despite that the raw data file has only 10 records ?!

finally , again your solution solve the issue by 80 % thanks

regards ,
Nabil


#4

You can remove and parse the timestamp using

    dissect { mapping => { "message" => "001:%{ts} %{+ts} %{}" } }
    date { match => [ "ts", "YYYY-MM-dd HH:mm:ss" ] }
    mutate { gsub => [ "message", "001:[^ ]+ [^ ]+ ", "" ] }
    kv { field_split_pattern => " 0" value_split => ":" whitespace => strict }