Multiple patterns regrouping in one line index

Hamza_Dhahri · May 8, 2018, 10:18pm

< 2018-02-07T10:42:02,831 [ ExtractDwhData] [INFO ] Solife :: Tools :: DWH :: ITK DWH - version : 2.6.0-SNAPSHOT - build #265 on 2018-01-04 08:22:32
2018-02-07T10:42:02,832 [ ExtractDwhData] [INFO ] Starting DWH Data Extraction with run timestamp : 2018-02-07 10:42:02
2018-02-07T12:24:45,167 [ ExtractDwhData] [INFO ] Solife DWH data EXTRACTION finished in 1 hours, 42 minutes, 42.368 seconds
hello everyone />

for those three lines i use this config file

</>

input {
file {
type => "test1"
path => ["C:/Users/THINKPAD/Downloads/logstash-6.2.2/essai/."]

}

filter {
if [type] == "test1"{
grok {
match =>["message", "%{TIMESTAMP_ISO8601:timestamp}%{GREEDYDATA:message1}\s+Extraction\sbatch\sID\s:\s%{NUMBER:ID_extraction_globale}",
"message","%{TIMESTAMP_ISO8601:start_time_extraction_globale}%{GREEDYDATA:message2}\sStarting\sDWH\sData\sExtraction%{GREEDYDATA:message3}"
,"message","%{TIMESTAMP_ISO8601:END_TIME}%{GREEDYDATA:message4}\sSolife\sDWH\sdata\sEXTRACTION\sfinished\sin%{GREEDYDATA:temps_totales}"]

}

mutate {
remove_field => [ "message1" ,"message2","message3","message4"]
}

if "_grokparsefailure" in [tags] {
drop {}
}

}
}

output {
if [type] == "test1"{
elasticsearch { hosts => ["localhost:9200"]
index=>"globalextraction"}
stdout {
codec => rubydebug
}
}
}

i want to get the result of those 3 patterns in elastic search in one line whish have
the informations
but the problem is i get each pattern in a single line and in the table of elasticearch i have 3 lines
someeone help me to add a command to regroup the result of parsing in one line

Badger · May 9, 2018, 1:05am

Please read what you posted, it is really hard to read. I do not understand why you have a "Extraction\sbatch\sID" when none of your messages match it.

It sounds an aggregate filter might be able to do this.

For the grok, personally I would prefer dissect, but that's not a big deal. Do a grok to split the timestamp off the message, then do a grok to ignore the "ExtractDwhData" and ignore the log level. Then you really do have unstructured data that grok might work for.

if "_grokparsefailure" in [tags] {
drop {}
}

This tends to be a bad idea. If your configuration fails to parse the data then you will usually be better off tagging it for review than dropping it.

The reason I prefer not to use grok is that a GREEDYDATA anywhere except at the end of the message, such as

{GREEDYDATA:message2}\sStarting\sDWH\sData\sExtraction%{GREEDYDATA:message3}

can get really expensive. The regexp processor will have to step through the message one character at a time seeing if the rest of the message matches. This can lead to timeouts.

Also, removing temporary fields (message2 etc.) should be deferred until you know the patterns are working. And do not even try to index them into elasticsearch until you get good output when you run logstash on the command line with 'output { stdout { codec => rubydebug } }'.

Hamza_Dhahri · May 9, 2018, 7:44am

2018-02-07T10:42:06,865 [ ExtractDwhData] [INFO ] Extraction batch ID : 28
2018-02-07T10:42:02,832 [ ExtractDwhData] [INFO ] Starting DWH Data Extraction with run timestamp : 2018-02-07 10:42:02
2018-02-07T12:24:45,167 [ ExtractDwhData] [INFO ] Solife DWH data EXTRACTION finished in 1 hours, 42 minutes, 42.368 seconds

$ those are the correct lines i made a mistake

thanks for the answer but i have a log in which there is 1600 lines
i put those three lines because
those where i have the informations

yes i want the result in one line in the table of the index
not threee lines in which i can had an empty colons in some lines
so how my code will be if i use the filter aggregate ? thanks a lot

Badger · May 9, 2018, 2:06pm

If there are ever multiple extractions occuring then this will not work, since there appears to be nothing in the log messages allowing you to correlate which is which.

This would allow you to combine those three lines.

filter {
  dissect { mapping => [ "message", '%{ts} [%{f1}] [%{loglevel}] %{text}' ] }
  mutate { add_field => { "static" => "1" } }
  if [text] =~ /^Extraction batch ID/ {
    grok { match => [ "text", "Extraction batch ID : %{NUMBER:ID_extraction_globale}" ] }
    aggregate {
      task_id => "%{static}"
      code => "map['id'] = event.get('ID_extraction_globale')"
    }
    drop {}
  }
  if [text] =~ /^Starting DWH Data Extraction with run timestamp/ {
    grok { match => [ "text", "Starting DWH Data Extraction with run timestamp : %{TIMESTAMP_ISO8601:runtimestamp}" ] }
    aggregate {
      task_id => "%{static}"
      code => "map['runtimestamp'] = event.get('runtimestamp')"
    }
    drop {}
  }
  if [text] =~ /^Solife DWH data EXTRACTION finished/ {
    mutate { gsub => [ "text", "Solife DWH data EXTRACTION finished in ", "", "text", " hours, ", ":", "text", " minutes, ", ":", "text", " seconds", "" ] }
    mutate { rename => { "text" => "duration" } }
    aggregate {
      task_id => "%{static}"
      code => "event.set('ID_extraction_globale', map['id'])
               event.set('runtimestamp', map['runtimestamp'])"
      map_action => "update"
    }
  }
  date { match => [ "ts" , "YYYY-MM-dd'T'HH:mm:ss,SSS" ] }
}

Hamza_Dhahri · May 9, 2018, 2:42pm

i had this error

Hamza_Dhahri · May 9, 2018, 2:44pm

this is the config file

Badger · May 9, 2018, 2:53pm

It appears to be joining the two lines. Add a semi-colon at the end of the first line.

code => "event.set('ID_extraction_globale', map['id']);
               event.set('runtimestamp', map['runtimestamp'])"

Hamza_Dhahri · May 9, 2018, 3:01pm

sorry for your time
but the same error i don't know where is the error
and why there is a semicolon and "with" with pink color

Hamza_Dhahri · May 9, 2018, 3:01pm

Hamza_Dhahri · May 9, 2018, 3:06pm

i just want to ask you where is the fields ? i didn't found them in your patterns

Badger · May 9, 2018, 4:09pm

Look at line 36. You have joined the two lines into one without a semicolon to separate the two statements. That will get you unexpected tIDENTIFIER all day long.

Hamza_Dhahri · May 9, 2018, 5:15pm

i added the semicolon and always the same wrong ; i think you didn't understand what i wantto parse

Badger · May 9, 2018, 5:21pm

Yeah. I used different names and formatted temp_totales differently. The important thing is that it joins data from the three lines into one event. Which data does not really matter.

Hamza_Dhahri · May 9, 2018, 8:04pm

the first code that i made it work but the problem is i get the result in 3 lines the GREEDYDATA where i put "message 1 or 2 or.. " i don't need those informations , every information i need i give it a significant name like (id_extraction_globale,END_time etc ..)
could you modify it by adding the command "aggregate" in the correct places to put the result in one line in the index without complicating it ? thank you so much badger i m really blocked in this step in my intership..

Badger · May 9, 2018, 8:26pm

filter {
  dissect { mapping => [ "message", '%{[@metadata][ts]} [%{}] [%{}] %{[@metadata][text]}' ] }
  mutate { add_field => { "[@metadata][static]" => "1" } }
  if [@metadata][text] =~ /^Extraction batch ID/ {
    grok { match => [ "[@metadata][text]", "Extraction batch ID : %{NUMBER:ID_extraction_globale}" ] }
    aggregate {
      task_id => "%{[@metadata][static]}"
      code => "map['id'] = event.get('ID_extraction_globale');
               map['timestamp'] = event.get('[@metadata][ts]');"
    }
    drop {}
  }
  if [@metadata][text] =~ /^Starting DWH Data Extraction with run timestamp/ {
    aggregate {
      task_id => "%{[@metadata][static]}"
      code => "map['start_time_extraction_globale'] = event.get('[@metadata][ts]');"
    }
    drop {}
  }
  if [@metadata][text] =~ /^Solife DWH data EXTRACTION finished/ {
    grok { match => [ "[@metadata][text]", "Solife\sDWH\sdata\sEXTRACTION\sfinished\sin\s%{GREEDYDATA:temps_totales}" ] }
    aggregate {
      task_id => "%{[@metadata][static]}"
      code => "event.set('ID_extraction_globale', map['id']);
               event.set('timestamp', map['timestamp']);
               event.set('start_time_extraction_globale', map['start_time_extraction_globale']);"
      map_action => "update"
    }
  }
}

Hamza_Dhahri · May 11, 2018, 2:17pm

hello it's works but the problem is a have a log in which i have 2000 lines
when i make as un put all the log it parse all the lines
which command should i add to parse only the 3 lines i want to
(like grok parse failure ) thank you for your time

Badger · May 11, 2018, 2:30pm

If you want to throw away the entire file apart from the fields extracted from those 3 lines then change the end of the filter from

}
}

to be

  } else {
    drop {}
  }
}

Hamza_Dhahri · May 11, 2018, 3:29pm

filter {

grok {
match =>["message","%{TIMESTAMP_ISO8601:Start_time_conversion}%{GREEDYDATA:message1}\s+Starting\sstep\s2%{GREEDYDATA:message2}",
"message","%{TIMESTAMP_ISO8601:ENd_Time_Conversion}%{GREEDYDATA:message3}\s+End\sof\sXML\sto\sCSV\sconversion%{GREEDYDATA:message4}%{NUMBER:total_number}%{GREEDYDATA:message5}\s+were%{GREEDYDATA:status}\s+processed\sin\s%{GREEDYDATA:duration}"

]
}

and those are the lines

2018-02-07T12:24:18,215 [ ExtractDwhData] [INFO ] Starting step 2 : XML transformation to CSV...
2018-02-07T12:24:36,071 [ XmlToCsvConverter] [INFO ] End of XML to CSV conversion. 31 XML files were successfully processed in 17.828 seconds

i need only
start_time_conversion
end_time_conversion
duration
status
how to get those fields in one index like you did in the first time
thank you i am really appriciated

system · June 8, 2018, 3:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple patterns in one index Logstash	2	748	May 8, 2018
Specific GROK filter for multi-line Postgresql log Logstash	13	5239	July 6, 2017
How to manage multiline events based on a random field Logstash	24	5665	July 6, 2017
Parsing human readable log files with grok Logstash	19	2695	July 6, 2017
How do I match a newline in grok/logstash Logstash	13	16735	July 6, 2017

Multiple patterns regrouping in one line index

Related topics