Need Help Parsing PS command Output using logstash

Need Help Parsing PS command Output using logstash.
My input file looks like below.

Linux VM Ubuntu

zzz ***Thu Jul 23 13:00:11 UTC 2020
USER        PID   PPID PRI %CPU %MEM    VSZ   RSS WCHAN             S  STARTED     TIME COMMAND
root   124829 124101  19 30.4  8.3 37616628 8469316 futex_wait_queue_ S 10:41:23 00:42:14 some string -weblogic.Name=SOAPServer1  some string
root    20162  19530  19 67.2  7.6 35619736 7688116 futex_wait_queue_ S 11:50:28 00:46:52 some string -weblogic.Name=UIServer1 some string
root    41816  41791  19  2.1  3.7 4247708 3758620 futex_wait_queue_ S   Jul 22 00:42:16 reportserver -active
root    32372  31778  19  8.6  3.6 24847800 3649564 futex_wait_queue_ S   Jul 14 18:55:06 some string -weblogic.Name=AdminServer1 some string
root    41042  38613  19  3.5  3.2 33886276 3249036 futex_wait_queue_ S 14:31:11 00:47:58 some string -weblogic.Name=export_server1 some string
zzz ***Thu Jul 23 13:00:42 UTC 2020
USER        PID   PPID PRI %CPU %MEM    VSZ   RSS WCHAN             S  STARTED     TIME COMMAND
root   124829 124101  19 30.3  8.4 37620728 8470088 futex_wait_queue_ S 10:41:23 00:42:18 some string -Dweblogic.Name=SOAPServer1  some string
root   123010 122385  19 32.6  8.3 37572960 8370440 futex_wait_queue_ S 10:40:33 00:45:46 some string -weblogic.Name=import_server1 some string  
root    20162  19530  19 66.8  7.6 35619736 7688240 futex_wait_queue_ S 11:50:28 00:46:56 some string -weblogic.Name=UIServer1 some string
root    41816  41791  19  2.1  3.7 4247708 3758708 futex_wait_queue_ S   Jul 22 00:42:48 reportserver -active
root    32372  31778  19  8.6  3.6 24847800 3653908 futex_wait_queue_ S   Jul 14 18:55:07 some string -weblogic.Name=AdminServer1 some string

I'm trying to create output csv like below with columns "tags","time_value","user_name","pid","ppid","pri","percent_cpu","percent_mem","vsz","rss","wchan","s","started","time","command".

13:00:11,root,124829,124101,19,30.4,8.3,37616628,8469316,futex_wait_queue_,S,10:41:23,00:42:14,SOAPServer1
13:00:11,root,20162,19530,19,67.2,7.6,35619736,7688116,futex_wait_queue_,S,11:50:28,00:46:52,UIServer1
13:00:11,root,41816,41791,19,2.1,3.7,4247708,3758620,futex_wait_queue_,S,Jul,22,00:42:16 reportserver
13:00:11,root,32372,31778,19,8.6,3.6,24847800,3649564,futex_wait_queue_,S,Jul,14,18:55:06,AdminServer1
13:00:11,root,41042,38613,19,3.5,3.2,33886276,3249036,futex_wait_queue_,S,14:31:11,00:47:58,export_server1
13:00:42,root,124829,124101,19,30.3,8.4,37620728,8470088,futex_wait_queue_,S,10:41:23,00:42:18,SOAPServer1
13:00:42,root,123010,122385,19,32.6,8.3,37572960,8370440,futex_wait_queue_,S,10:40:33,00:45:46,import_server1
13:00:42,root,20162,19530,19,66.8,7.6,35619736,7688240,futex_wait_queue_,S,11:50:28,00:46:56,UIServer1
13:00:42,root,41816,41791,19,2.1,3.7,4247708,3758708,futex_wait_queue_,S,Jul,22,00:42:48,reportserver
13:00:42,root,32372,31778,19,8.6,3.6,24847800,3653908,futex_wait_queue_,S,Jul,14,18:55:07,AdminServer1

I've written below configuration file.

input {
     file {
       type => "oswpslog"
       tags => "oswps"
       path => [ "/home/ps.log" ]
	   codec => multiline {
			pattern => "^zzz"
			negate => "true"
			what => "previous"
       }
	   ## For debugging
       start_position => "beginning"
       sincedb_path => "NUL" #Setting sincedb_path => "NUL" (in windows) OR "/dev/null" in linux causes logStash to read the old lines (before starting log stash as well as the new lines in input file
	   
     }
}
filter {
	 grok {
			  match => { "message" => "z+\s+(\*+)%{WORD:week_day} %{MONTH:month_string} %{NUMBER:month_day} %{TIME:time_value} %{TZ:timezone} %{YEAR:year_number}%{GREEDYDATA:test_saurabhc1}%{OSWPSHEADINGS:headings}%{GREEDYDATA}%{WORD:user_name}%{SPACE}%{NUMBER:pid}%{SPACE}%{NUMBER:ppid}%{SPACE}%{NUMBER:pri}%{SPACE}%{NUMBER:percent_cpu}%{SPACE}%{NUMBER:percent_mem}%{SPACE}%{NUMBER:vsz}%{SPACE}%{NUMBER:rss}%{SPACE}%{WORD:wchan}%{SPACE}%{WORD:s}%{SPACE}%{TIME:started}%{SPACE}%{TIME:time}%{SPACE}%{DATA:command}"}
	 }
}
output {
	   csv {
		  path => "/home/outlog_ps.csv"
		  fields => ["tags","time_value","user_name","pid","ppid","pri","percent_cpu","percent_mem","vsz","rss","wchan","s","started","time","command"]
	   }
}

The output is creating only two lines (instead of 10 lines) as a success, as shown below. I need 10 lines. Also, the "command" column output needs to be modified to only show the text as mentioned above.
Output from above config file.

"[""oswps"", ""_grokparsefailure""]",,,,,,,,,,,,,,
"[""multiline"", ""oswps""]",13:00:11,oracle,41042,38613,19,3.5,3.2,33886276,3249036,futex_wait_queue_,S,14:31:11,00:47:58,
"[""multiline"", ""oswps""]",13:00:42,oracle,20162,19530,19,66.8,7.6,35619736,7688240,futex_wait_queue_,S,11:50:28,00:46:56,

This is not the expected output for me.
Need help with the above

You are using a multiline codec to join together all of the lines after a line that starts with zzz. There are only two of those.

Ah, Yes! Thank You for pointing this out!
Alternately, can I parse two multilines that I get multiple times, each time for a different pattern match?
What I want is below.
So, say I have below two grok filters.

match => { "message" => "z+\s+(\*+)%{WORD:week_day} %{MONTH:month_string} %{NUMBER:month_day} %{TIME:time_value} %{TZ:timezone} %{YEAR:year_number}%{GREEDYDATA:test_saurabhc1}%{WORD:user_name}%{SPACE}%{NUMBER:pid}%{SPACE}%{NUMBER:ppid}%{SPACE}%{NUMBER:pri}%{SPACE}%{NUMBER:percent_cpu}%{SPACE}%{NUMBER:percent_mem}%{SPACE}%{NUMBER:vsz}%{SPACE}%{NUMBER:rss}%{SPACE}%{WORD:wchan}%{SPACE}%{WORD:s}%{SPACE}%{TIME:started}%{SPACE}%{TIME:time}%{GREEDYDATA}(?<command>SOAPServer[0-9A-Za-z]+)"}
match => { "message" => "z+\s+(\*+)%{WORD:week_day} %{MONTH:month_string} %{NUMBER:month_day} %{TIME:time_value} %{TZ:timezone} %{YEAR:year_number}%{GREEDYDATA:test_saurabhc1}%{WORD:user_name}%{SPACE}%{NUMBER:pid}%{SPACE}%{NUMBER:ppid}%{SPACE}%{NUMBER:pri}%{SPACE}%{NUMBER:percent_cpu}%{SPACE}%{NUMBER:percent_mem}%{SPACE}%{NUMBER:vsz}%{SPACE}%{NUMBER:rss}%{SPACE}%{WORD:wchan}%{SPACE}%{WORD:s}%{SPACE}%{TIME:started}%{SPACE}%{TIME:time}%{GREEDYDATA}(?<command>UIServer[0-9A-Za-z]+)"}

So for each of the two multiline lines I should get two matches and my output csv contains fours lines.

13:00:11,root,124829,124101,19,30.4,8.3,37616628,8469316,futex_wait_queue_,S,10:41:23,00:42:14,SOAPServer1
13:00:11,root,20162,19530,19,67.2,7.6,35619736,7688116,futex_wait_queue_,S,11:50:28,00:46:52,UIServer1
13:00:42,root,124829,124101,19,30.3,8.4,37620728,8470088,futex_wait_queue_,S,10:41:23,00:42:18,SOAPServer1
13:00:42,root,20162,19530,19,66.8,7.6,35619736,7688240,futex_wait_queue_,S,11:50:28,00:46:56,UIServer1

Means for each grok filter pattern the line should be parsed once. So, if there are three different patterns, then each line should be searched thrice.

Basically, Why I'm persisting with the multiline codec is, I want the time for each process. Also, for each ps output reading, I may not get all the required processes in output.

If you have the two events, one for each zzz, then you can process them using this:

    dissect { mapping => { "message" => "%{} %{} %{} %{} %{time_value} %{}" } }
    # Literal newline in configuration. Split multiline input into array
    mutate { split => { "message" => "
" } }
    # Split array into separate events
    split { field => "message" }

    # After this, we have one event for each line of ps output
    if [message] =~ /^(zzz|USER)/ { drop {} }

    # Compress multiple spaces into one
    mutate { gsub => [ "message", "\s+", " " ] }

    dissect { mapping => { "message" => "%{user_name} %{pid} %{ppid} %{pri} %{percent_cpu} %{percent_mem} %{vsz} %{rss} %{wchan} %{s} %{[@metadata][restOfLine]}" } }

    grok {
        pattern_definitions => { "PSDATE" => "(%{MONTH} %{MONTHDAY}|%{TIME})" }
        match => { "[@metadata][restOfLine]" => "^%{PSDATE:started} %{GREEDYDATA:[@metadata][theEnd]}" }
    }
    grok {
        match => { "[@metadata][theEnd]" => "%{TIME:time} %{GREEDYDATA:command}" }
    }

That will get you events like

  "user_name" => "root",
        "rss" => "3653908",
      "wchan" => "futex_wait_queue_",
          "s" => "S",
 "time_value" => "13:00:42",
    "started" => "Jul 14",
       "time" => "18:55:07",
        "vsz" => "24847800",
"percent_cpu" => "8.6",
    "command" => "some string -weblogic.Name=AdminServer1 some string",
        "pri" => "19",
"percent_mem" => "3.6",
 "@timestamp" => 2020-08-07T16:25:51.945Z,
       "ppid" => "31778",
        "pid" => "32372"

and you can just write them out with a csv output.

The first grok pattern may need a slight adjustment depending on whether
the first 9 days of the month show up as

Aug 8 (one space) or Aug  8 (two spaces)
 pattern_definitions => { "PSDATE" => "(%{MONTH}\s+%{MONTHDAY}|%{TIME})" }

would probably work for both

Apologies for delayed reply, I'm currently unwell. Would try to work with your solution once I'm well.

Hi, Thanks a lot for your solution.
But I ran into a different problem while trying to split my multiline on a new line. I tried all below options but unfortunately, nothing worked for splitting the multiline.

  1. mutate { split => { "message" => "
    " } }
  2. mutate { split => { "message" => "\n" } }
  3. ruby {code => 'message_array = event.get("message").split("\n")'}
  4. ruby {code => 'message_array = event.get("message").split("
    ")'}

In my output I see my multiline message like below, with different lines separated by '\n'.
"message":["zzz ***Thu Jul 23 13:00:42 UTC 2020\nUSER PID PPID COMMAND\nroot 124829 124101 -Xmx12288m\nroot 123010 122385 -Xmx14336m"]

Note: I've removed some text to make the message look smaller.

Your message is an array, so try either

mutate { replace => { "message" => "[message][0]" } }

or

mutate { split => { "[message][0]" => "
" } }

Thank You!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.