Multiple patterns regrouping in one line index

< 2018-02-07T10:42:02,831 [ ExtractDwhData] [INFO ] Solife :: Tools :: DWH :: ITK DWH - version : 2.6.0-SNAPSHOT - build #265 on 2018-01-04 08:22:32
2018-02-07T10:42:02,832 [ ExtractDwhData] [INFO ] Starting DWH Data Extraction with run timestamp : 2018-02-07 10:42:02
2018-02-07T12:24:45,167 [ ExtractDwhData] [INFO ] Solife DWH data EXTRACTION finished in 1 hours, 42 minutes, 42.368 seconds
hello everyone />

for those three lines i use this config file

</>

input {
file {
type => "test1"
path => ["C:/Users/THINKPAD/Downloads/logstash-6.2.2/essai/."]

}

}

filter {
if [type] == "test1"{
grok {
match =>["message", "%{TIMESTAMP_ISO8601:timestamp}%{GREEDYDATA:message1}\s+Extraction\sbatch\sID\s:\s%{NUMBER:ID_extraction_globale}",
"message","%{TIMESTAMP_ISO8601:start_time_extraction_globale}%{GREEDYDATA:message2}\sStarting\sDWH\sData\sExtraction%{GREEDYDATA:message3}"
,"message","%{TIMESTAMP_ISO8601:END_TIME}%{GREEDYDATA:message4}\sSolife\sDWH\sdata\sEXTRACTION\sfinished\sin%{GREEDYDATA:temps_totales}"]

}

mutate {
remove_field => [ "message1" ,"message2","message3","message4"]
}

if "_grokparsefailure" in [tags] {
drop {}
}

}
}

output {
if [type] == "test1"{
elasticsearch { hosts => ["localhost:9200"]
index=>"globalextraction"}
stdout {
codec => rubydebug
}
}
}

i want to get the result of those 3 patterns in elastic search in one line whish have
the informations
but the problem is i get each pattern in a single line and in the table of elasticearch i have 3 lines
someeone help me to add a command to regroup the result of parsing in one line :slight_smile:

Please read what you posted, it is really hard to read. I do not understand why you have a "Extraction\sbatch\sID" when none of your messages match it.

It sounds an aggregate filter might be able to do this.

For the grok, personally I would prefer dissect, but that's not a big deal. Do a grok to split the timestamp off the message, then do a grok to ignore the "ExtractDwhData" and ignore the log level. Then you really do have unstructured data that grok might work for.

if "_grokparsefailure" in [tags] {
drop {}
}

This tends to be a bad idea. If your configuration fails to parse the data then you will usually be better off tagging it for review than dropping it.

The reason I prefer not to use grok is that a GREEDYDATA anywhere except at the end of the message, such as

{GREEDYDATA:message2}\sStarting\sDWH\sData\sExtraction%{GREEDYDATA:message3}

can get really expensive. The regexp processor will have to step through the message one character at a time seeing if the rest of the message matches. This can lead to timeouts.

Also, removing temporary fields (message2 etc.) should be deferred until you know the patterns are working. And do not even try to index them into elasticsearch until you get good output when you run logstash on the command line with 'output { stdout { codec => rubydebug } }'.

2018-02-07T10:42:06,865 [ ExtractDwhData] [INFO ] Extraction batch ID : 28
2018-02-07T10:42:02,832 [ ExtractDwhData] [INFO ] Starting DWH Data Extraction with run timestamp : 2018-02-07 10:42:02
2018-02-07T12:24:45,167 [ ExtractDwhData] [INFO ] Solife DWH data EXTRACTION finished in 1 hours, 42 minutes, 42.368 seconds

$ those are the correct lines i made a mistake

thanks for the answer but i have a log in which there is 1600 lines
i put those three lines because
those where i have the informations

yes i want the result in one line in the table of the index
not threee lines in which i can had an empty colons in some lines
so how my code will be if i use the filter aggregate ? thanks a lot

If there are ever multiple extractions occuring then this will not work, since there appears to be nothing in the log messages allowing you to correlate which is which.

This would allow you to combine those three lines.

filter {
  dissect { mapping => [ "message", '%{ts} [%{f1}] [%{loglevel}] %{text}' ] }
  mutate { add_field => { "static" => "1" } }
  if [text] =~ /^Extraction batch ID/ {
    grok { match => [ "text", "Extraction batch ID : %{NUMBER:ID_extraction_globale}" ] }
    aggregate {
      task_id => "%{static}"
      code => "map['id'] = event.get('ID_extraction_globale')"
    }
    drop {}
  }
  if [text] =~ /^Starting DWH Data Extraction with run timestamp/ {
    grok { match => [ "text", "Starting DWH Data Extraction with run timestamp : %{TIMESTAMP_ISO8601:runtimestamp}" ] }
    aggregate {
      task_id => "%{static}"
      code => "map['runtimestamp'] = event.get('runtimestamp')"
    }
    drop {}
  }
  if [text] =~ /^Solife DWH data EXTRACTION finished/ {
    mutate { gsub => [ "text", "Solife DWH data EXTRACTION finished in ", "", "text", " hours, ", ":", "text", " minutes, ", ":", "text", " seconds", "" ] }
    mutate { rename => { "text" => "duration" } }
    aggregate {
      task_id => "%{static}"
      code => "event.set('ID_extraction_globale', map['id'])
               event.set('runtimestamp', map['runtimestamp'])"
      map_action => "update"
    }
  }
  date { match => [ "ts" , "YYYY-MM-dd'T'HH:mm:ss,SSS" ] }
}

i had this error :frowning:


this is the config file

It appears to be joining the two lines. Add a semi-colon at the end of the first line.

code => "event.set('ID_extraction_globale', map['id']);
               event.set('runtimestamp', map['runtimestamp'])"

sorry for your time :cry:
but the same error i don't know where is the error
and why there is a semicolon and "with" with pink color

i just want to ask you where is the fields ? i didn't found them in your patterns

Look at line 36. You have joined the two lines into one without a semicolon to separate the two statements. That will get you unexpected tIDENTIFIER all day long.

i added the semicolon and always the same wrong ; i think you didn't understand what i wantto parse :slight_smile:

Yeah. I used different names and formatted temp_totales differently. The important thing is that it joins data from the three lines into one event. Which data does not really matter.

the first code that i made it work but the problem is i get the result in 3 lines the GREEDYDATA where i put "message 1 or 2 or.. " i don't need those informations , every information i need i give it a significant name like (id_extraction_globale,END_time etc ..)
could you modify it by adding the command "aggregate" in the correct places to put the result in one line in the index without complicating it ? thank you so much badger i m really blocked in this step in my intership..

filter {
  dissect { mapping => [ "message", '%{[@metadata][ts]} [%{}] [%{}] %{[@metadata][text]}' ] }
  mutate { add_field => { "[@metadata][static]" => "1" } }
  if [@metadata][text] =~ /^Extraction batch ID/ {
    grok { match => [ "[@metadata][text]", "Extraction batch ID : %{NUMBER:ID_extraction_globale}" ] }
    aggregate {
      task_id => "%{[@metadata][static]}"
      code => "map['id'] = event.get('ID_extraction_globale');
               map['timestamp'] = event.get('[@metadata][ts]');"
    }
    drop {}
  }
  if [@metadata][text] =~ /^Starting DWH Data Extraction with run timestamp/ {
    aggregate {
      task_id => "%{[@metadata][static]}"
      code => "map['start_time_extraction_globale'] = event.get('[@metadata][ts]');"
    }
    drop {}
  }
  if [@metadata][text] =~ /^Solife DWH data EXTRACTION finished/ {
    grok { match => [ "[@metadata][text]", "Solife\sDWH\sdata\sEXTRACTION\sfinished\sin\s%{GREEDYDATA:temps_totales}" ] }
    aggregate {
      task_id => "%{[@metadata][static]}"
      code => "event.set('ID_extraction_globale', map['id']);
               event.set('timestamp', map['timestamp']);
               event.set('start_time_extraction_globale', map['start_time_extraction_globale']);"
      map_action => "update"
    }
  }
}

hello it's works but the problem is a have a log in which i have 2000 lines
when i make as un put all the log it parse all the lines
which command should i add to parse only the 3 lines i want to
(like grok parse failure ) thank you for your time

If you want to throw away the entire file apart from the fields extracted from those 3 lines then change the end of the filter from

  }
}

to be

  } else {
    drop {}
  }
}

filter {

grok {
match =>["message","%{TIMESTAMP_ISO8601:Start_time_conversion}%{GREEDYDATA:message1}\s+Starting\sstep\s2%{GREEDYDATA:message2}",
"message","%{TIMESTAMP_ISO8601:ENd_Time_Conversion}%{GREEDYDATA:message3}\s+End\sof\sXML\sto\sCSV\sconversion%{GREEDYDATA:message4}%{NUMBER:total_number}%{GREEDYDATA:message5}\s+were%{GREEDYDATA:status}\s+processed\sin\s%{GREEDYDATA:duration}"

]
}

and those are the lines

2018-02-07T12:24:18,215 [ ExtractDwhData] [INFO ] Starting step 2 : XML transformation to CSV...
2018-02-07T12:24:36,071 [ XmlToCsvConverter] [INFO ] End of XML to CSV conversion. 31 XML files were successfully processed in 17.828 seconds

i need only
start_time_conversion
end_time_conversion
duration
status
how to get those fields in one index like you did in the first time
thank you i am really appriciated

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.