I've got log that starts with a jobid on our build cluster live following:
"Job <7073381> is submitted to queue "
I'd like to extract the job id from that line and add it to all other matched lines that look like this:
xmelab: *W,CUSENMP: Use -NAMEMAP_MIXGEN with name mapped instantiation 'RBR_SV_PARAM_I'
you are using the file input plugin. So Job <7073381> is submitted to queue and xmelab: *W,CUSENMP: Use -NAMEMAP_MIXGEN are two different lines and so differents events.
You can have other lines who don't respect the two grok pattern you give to us.
To do what you want i think you have to use a ruby filter to save the job_id value to use it in other lines.
grok {
match => {
"message" => '%{GREEDYDATA} \<%{NUMBER:jobid}\> %{GREEDYDATA} \<%{WORD:queue}\>'
}
# If the match is verified, then add 'source_job_id_line' to the 'tags' field.
add_tag => [ "source_job_id_line" ]
}
grok {
match => {
"message" => '%{WORD:process}: %{DATA:log_level},%{DATA:subprocess}: %{GREEDYDATA:logMessage}'
}
# If the match is verified, then add 'destination_job_id_line' to the 'tags' field.
add_tag => [ "destination_job_id_line" ]
}
# If the current line contains a new job id
if 'source_job_id_line' in [tags] {
ruby {
# Initialization of jobId to -1 at logstash startup-time
init => '@@jobId = -1'
# Put the content of the field 'jobid' to the class variable
code => '
@@jobId = event.get('jobid');
'
remove_tag => [ 'source_job_id_line' ]
}
}
# If the current line need the job id
if 'destination_job_id_line' in [tags] {
ruby {
code => '
# Adding the job id to the end of the line
event.set('message', event.get('message') + @@jobId.to_s);
# Adding a job id field
event.set('jobid', @@jobId.to_s);
'
remove_tag => [ 'destination_job_id_line' ]
}
}
What it do :
First i split the grok filter in two to have the possibility to add a different tag depending of the current line read.
After, depending of the value in the tags field, i edit the @@jobId variable or i use it.
I haven't tried the code so maybe it won't work on the first try. Plus, this configuration need to set the number of pipeline worker to 1 to make sure that logstash read the file line by line in the correct order
Edit: Like Badger explain in the response, we need to use class variable.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.