-so if you notice , you will find the fields are arranged in this manner , 001,002,003,004...............................................053,
-so what i need to do in grok pattern is -->
001:%{DATA:x} 002:%{DATA:y} 003:%{DATA:z} ..........................................
but please note that these fields are optional and vary from one record to another , and here is the problem , in which how can i add them optionally in the grok pattern ?
-i tried something like this as a trial but it was rejected by grok pattern -->
001:%{DATA:x} (002:%{DATA:y})? (003:%{DATA:z})? ..........................................
thanks as usual for your help , but please check the below as i have some point -->
output -->
"hits" : {
"total" : 30,
"hits" : [
{
"001" : "2019-04-18",
"source" : "/var/log/logstash-tutorial.log",
"09" : "3528700523825626 ",
"0" : "00:01",
"message" : "001:2019-04-18 00:00:01 003:201006773798 009:3528700523825626 012:RemoteNotify 022:1 032:triggered_SMS by trigger id 99999999",
"03" : "201006773798 ",
"22" : "1 ",
"32" : "triggered_SMS by trigger id 99999999",
"12" : "RemoteNotify ",
"@timestamp" : "2019-04-18T07:48:01.168Z"
}
- now we have separated as per your solution the fields but we have some points:
first : if we see the message above we could see that "001:2019-04-18 00:00:01"
is one field for "001" = "2019-04-18 00:00:01" .
second: this field "001" how can i identify it as a "date" to be used instead of elastic timestamp.
third: why index have 30 hits , despite that the raw data file has only 10 records ?!
finally , again your solution solve the issue by 80 % thanks
but i have some questions to understand these filter gathered together -->
dissect { mapping => { "message" => "001:%{ts} %{+ts} %{}" } }
*here i think we collect the filed of "001" to "ts" in twp parts ts & +ts , and the we left the remaining of the message.
mutate { gsub => ; [ "message", "001:[^ ]+ [^ ]+ ", "" ] }
*here this part i can not understand , it will be great if you just clarify the point here ?
The regexp matches 001: followed by a group of characters that are not space, followed by a space, followed by a group of characters that are not space, followed by a space. i.e., it removes the part of the message what was consumed by dissect.
To replace whatever matches that pattern (i.e. "001:2019-04-18 00:00:01 " with an empty string). That is, it removes that from the start of the string.
the error says that the pattern not found !!
however the raw data like this -->
001:2019-04-21 00:00:01 003:201006773798 009:3528700523825626 010: 014:Vodafone 015:85 020:MT 036: 049: 053:369330872,369330873,3
69330875
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.