Associating log lines with timestamp in log line

Hi,

I am currently trying to associate lines after I encounter the time field in log lines to that time field.

Sample log:

06-26-18 03:58:51 AM
top - 03:58:51 up 84 days, 2:05, 0 users, load average: 0.01, 0.08, 0.12
Tasks: 835 total, 1 running, 833 sleeping, 0 stopped, 1 zombie
Cpu(s): 1.3%us, 0.4%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65695260k total, 54419828k used, 11275432k free, 286076k buffers
Swap: 6143996k total, 0k used, 6143996k free, 39933352k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15213 avendato 20 0 19.9g 670m 19m S 3.6 1.0 2902:06 /usr/java/default/bin/java -Djava.util.logging.config.file=/var/avenda/tomcat/backend/co
6889 appuser 20 0 13.7g 911m 33m S 2.6 1.4 2942:16 /usr/local/avenda/tips/sbin/policy_server

06-26-18 11:57:48 PM
6711 root 20 0 109m 1252 1040 S 0.0 0.0 5:48.79 awk /top - / { print strftime("%m-%d-%y %r")}1
6712 root 20 0 27020 1204 916 S 0.0 0.0 0:24.81 /usr/sbin/rotatelogs /var/avenda/platform/log/system-load-monitor//system-load.%Y-%m-%d.
7183 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/1:2]
8408 postgres 20 0 16.4g 17m 8160 S 0.0 0.0 0:00.05 postgres: appuser tipsdb [local] idle

06-26-18 11:57:48 PM
top - 23:57:48 up 84 days, 22:04, 0 users, load average: 0.33, 0.23, 0.23
Tasks: 833 total, 1 running, 831 sleeping, 0 stopped, 1 zombie
Cpu(s): 1.4%us, 0.4%sy, 0.0%ni, 98.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65695260k total, 55910064k used, 9785196k free, 286240k buffers
Swap: 6143996k total, 0k used, 6143996k free, 41379668k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8216 postgres 20 0 16.3g 50m 32m S 4.4 0.1 0:14.68 postgres: appsuperuser tipsdb 127.0.0.1(36820) idle
15213 avendato 20 0 19.9g 671m 19m S 4.0 1.0 2943:49 /usr/java/default/bin/java -

I want each line to be associated with the timestamp I found above that. Could someone please help me out how to make the association?

Below is what I tried out:

file{
sincedb_path => "/dev/null"
path => ["///xyz-2018-12-25.log"]
start_position => "beginning"
type => "log-top"
tags => ["top"]
codec => multiline{
patterns_dir => ["//etc/logstash/conf.d/patterns/patterns.conf"]
pattern => "^%{DATETIME}"
negate => true
what => previous
}

    }

filter{
if [type] == "log-top" {

    if ( "PID USER" in [message] or [message]==""){
            drop{}
    }

    else if([message]=~/( AM)/ or [message]=~ /( PM)/){
          grok{
                   patterns_dir => ["//etc/logstash/conf.d/patterns/patterns.conf"]
                   match=>["message","%{DATETIME:logtime}"]
           }
            date{
                    match=> ["message","MM-dd-yy hh:mm:ss aa"]
                    target=> "@timestamp"
                    locale => "en"
            }
    }
    else if([message]=~ /(top -.*)/){
            grok{
                    patterns_dir => ["//etc/logstash/conf.d/patterns/patterns.conf"]
                    match => ["message","%{NOTSPACE:junk} +%{NOTSPACE:junk} %{HOUR:hour}:%{MINUTE:minute}:%{SECOND:seconds} +%{NOTSPACE:state} +%{SINCE:since}, +%{NUMBER:num-of-users:int} +%{NOTSPACE:junk} +%{NOTSPACE:junk} +%{NOTSPACE:junk} +%{NUMBER:load-avg-5min:float}, +%{NUMBER:load-avg-10min:float}, +%{NUMBER:load-avg-15min:float}"]
            }
   }

    else if([message]=~ /(Tasks:.*)/){
            grok{
                    match => ["message","%{NOTSPACE:junk} %{NUMBER:total-task:int} +%{NOTSPACE:junk}, +%{NUMBER:running-task:int} +%{NOTSPACE:junk}, +%{NUMBER:sleeping:int} +%{NOTSPACE:junk}, +%{NUMBER:stopped:int} +%{NOTSPACE:junk}, +%{NUMBER:zombie:int} +%{NOTSPACE:junk}"]
            }
   }

     else if([message]=~ /(Cpu\(s\):.*)/){
            grok{
                    match => ["message","%{NOTSPACE:junk} +%{NUMBER:user-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:sys-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:nice-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:idle-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:wait-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:hi-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:si-cpu-percent:float}%+%{NOTSPACE:junk}, +%{NUMBER:st-cpu-percent:float}%+%{NOTSPACE:junk}"]
           }
   }

     else if([message]=~ /(Mem:.*)/){
            grok{
                    match => ["message","%{NOTSPACE:junk} +%{NUMBER:total-mem:int}+%{NOTSPACE:total-mem-unit} +%{NOTSPACE:junk} +%{NUMBER:used-mem:int}+%{NOTSPACE:used-mem-unit} +%{NOTSPACE:junk} +%{NUMBER:free-mem:int}+%{NOTSPACE:free-mem-unit} +%{NOTSPACE:junk} +%{NUMBER:buf-mem:int}+%{NOTSPACE:buf-mem-unit} +%{NOTSPACE:junk}"]
            }
   }

    else if([message]=~ /(Swap:.*)/){
            grok{
                    match => ["message","%{NOTSPACE:junk} +%{NUMBER:total-swap:int}+%{NOTSPACE:total-swap-unit} +%{NOTSPACE:junk} +%{NUMBER:used-swap:int}+%{NOTSPACE:used-swap-unit} +%{NOTSPACE:junk} +%{NUMBER:free-swap:int}+%{NOTSPACE:free-swap-unit} +%{NOTSPACE:junk} +%{NUMBER:cache-swap:int}+%{NOTSPACE:cache-swap-unit} +%{NOTSPACE:junk}"]
            }
   }

    else {
            grok{
                    match=>["message","%{NUMBER:process-id:int} +%{NOTSPACE:user} +%{NOTSPACE:priority} +%{NUMBER:nice} +%{NOTSPACE:virt-mem-size} +%{NOTSPACE:reside-mem-size} +%{NOTSPACE:shared-mem-size} +%{NOTSPACE:process-status} +%{NUMBER:proc-cpu-use-percent:float} +%{NUMBER:proc-mem-use-percent:float} +%{NOTSPACE:cpu-time} +%{GREEDYDATA:command}"] }
    }
    mutate {remove_field => ["@version","path","host","junk_perc","junk"]}

}
}

I have defined pattern DATETIME as: DATETIME %{DATE} +%{HOUR}:%{MINUTE}:%{SECOND} +%{WORD} in patterns file.

I am facing issue with associating with the multiline codec. Output still appears with the timestamp of logstash runtime for line without timestamp which are maked as bold in the sample input.

You are matching "message" to the date filter.
Try this:

date{
match=> ["logtime","MM-dd-yy hh:mm:ss aa"]
target=> "@timestamp"
locale => "en"
}

However if you want to align each line with the same date/time then you need to use the multiline codec and say that a new entry exists once the datetime filter is matched again.

Thank you, Andreas, that worked. However, I am still facing issue to connect an indefinite number of lines having the same pattern.

04/03/2018 15:01:38 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.71 0.00 0.99 0.62 0.00 97.67

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda-1 0.74 71.22 491.72 3.69 6652.83 603.34 14.65 0.56 1.13 0.29 14.53
sda-2 0.00 0.00 0.25 0.00 1.97 0.00 8.00 0.00 0.00 0.00 0.00
....

Can I get the grok filter working for all the lines below which is currently working for only the first line?
This is the filter:
split{
"field" => "message"
"terminator" => "\n"
}
grok{
match => ["message","\n%{WORD:device} +%{NUMBER:read_request_merge_avg:float} +%{NUMBER:write_request_merge_avg:float} +%{NUMBER:read_iops_avg:float} +%{NUMBER:write_iops_avg:float} +%{NUMBER:MB_read_avg:float} +%{NUMBER:MB_write_avg:float} +%{NUMBER:avg_sector_size:float} +%{NUMBER:avg_queue_size:float} +%{NUMBER:io_wait_time_ms:float} +%{NUMBER:io_service_time_ms:float} +%{NUMBER:disk_util_perc:float}"]
}

I have also used multiline codec as you suggested, through which I am able to make the association.
Please help.

Thanks a lot again :slight_smile:

Just use multiple grok filters either in the same match condition or you can use and IF statement to look for "_grokparsefailure" tag.

#1st GROK
grok {
	match => { 
		"message" => ["FIRSTPATTERN","SECOND PATTERN","NthPATTERN"]
	}
}

if "_grokparsefailure" in [tags]{
	grok {
		remove_tag => ["_grokparsefailure"]
		match => { 
		"message" => ["ANOTHERPATTERN","BONUSPATTERN","NPLUS1PATTERN"]
	}
}

The issue I am facing is not with using different grok filters but using the same filter for n lines where the pattern matches but I am not sure how many lines come up as in the example I stated:
sda-1 0.74 71.22 491.72 3.69 6652.83 603.34 14.65 0.56 1.13 0.29 14.53
sda-2 0.00 0.00 0.25 0.00 1.97 0.00 8.00 0.00 0.00 0.00 0.00
...

are the 2 lines after which I can have many more lines and all these matches to the grok filter
grok{
match => ["message","\n%{WORD:device} +%{NUMBER:read_request_merge_avg:float} +%{NUMBER:write_request_merge_avg:float} +%{NUMBER:read_iops_avg:float} +%{NUMBER:write_iops_avg:float} +%{NUMBER:MB_read_avg:float} +%{NUMBER:MB_write_avg:float} +%{NUMBER:avg_sector_size:float} +%{NUMBER:avg_queue_size:float} +%{NUMBER:io_wait_time_ms:float} +%{NUMBER:io_service_time_ms:float} +%{NUMBER:disk_util_perc:float}"]
}
but I can't repeat the same filter as I don't know how many lines they occur in. Sometimes it might be just 1, sometimes it might be 4

can you help me making that grok filter mentioned above work for all the lines of the sample input, without me repeating the filter again and again?

Help appreciated :slight_smile:

Thank you.

Yeah the multiline filter will still work as you can include all of the possible grok filters and only consider the multiline event finished until it reads the next date/time string. Think of the multiline like a loop and you are looping through the grok filters until you hit the next date/time. However based on this info i'm not sure logstash is the right tool for you to use here. You may want to look into something like telegraf and influxdb since they handle this kind of time-series and performance data much better.

Thank you so much Andreas. I will check them out too.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.