Logstash--Need to create fields from Multi line log message which not able to get

Hello,

Below is my incoming single log message:

[2024-02-14T03:29:07.962+0000][118][safepoint   ] Safepoint "Cleanup", Time since last: 1000174402 ns, Reaching safepoint: 9291 ns, Cleanup: 83971 ns, At safepoint: 14116 ns, Total: 107378 ns
[2024-02-14T03:29:08.373+0000][118][gc,start    ] GC(16) Pause Young (Concurrent Start) (Metadata GC Threshold)
[2024-02-14T03:29:08.373+0000][118][gc,task     ] GC(16) Using 4 workers of 4 for evacuation
[2024-02-14T03:29:08.373+0000][118][gc,age      ] GC(16) Desired survivor size 40894464 bytes, new threshold 15 (max threshold 15)
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) Age table with threshold 15 (max threshold 15)
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   1:   13258944 bytes,   13258944 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   2:    1069200 bytes,   14328144 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   3:     754040 bytes,   15082184 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   4:     118080 bytes,   15200264 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   5:     322488 bytes,   15522752 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   6:     262400 bytes,   15785152 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   7:     431192 bytes,   16216344 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   8:     269312 bytes,   16485656 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age   9:    3006824 bytes,   19492480 total
[2024-02-14T03:29:08.387+0000][118][gc,age      ] GC(16) - age  10:     177680 bytes,   19670160 total
[2024-02-14T03:29:08.387+0000][118][gc,phases   ] GC(16)   Pre Evacuate Collection Set: 0.2ms
[2024-02-14T03:29:08.387+0000][118][gc,phases   ] GC(16)   Merge Heap Roots: 0.2ms
[2024-02-14T03:29:08.387+0000][118][gc,phases   ] GC(16)   Evacuate Collection Set: 12.5ms
[2024-02-14T03:29:08.387+0000][118][gc,phases   ] GC(16)   Post Evacuate Collection Set: 1.2ms
[2024-02-14T03:29:08.387+0000][118][gc,phases   ] GC(16)   Other: 0.4ms
[2024-02-14T03:29:08.387+0000][118][gc,heap     ] GC(16) Eden regions: 180->0(297)
[2024-02-14T03:29:08.387+0000][118][gc,heap     ] GC(16) Survivor regions: 4->10(39)
[2024-02-14T03:29:08.387+0000][118][gc,heap     ] GC(16) Old regions: 20->20
[2024-02-14T03:29:08.387+0000][118][gc,heap     ] GC(16) Archive regions: 2->2
[2024-02-14T03:29:08.387+0000][118][gc,heap     ] GC(16) Humongous regions: 3->3
[2024-02-14T03:29:08.387+0000][118][gc,metaspace] GC(16) Metaspace: 36172K(36608K)->36172K(36608K) NonClass: 32501K(32704K)->32501K(32704K) Class: 3671K(3904K)->3671K(3904K)
[2024-02-14T03:29:08.387+0000][118][gc          ] GC(16) Pause Young (Concurrent Start) (Metadata GC Threshold) 413M->67M(1024M) 14.537ms
[2024-02-14T03:29:08.387+0000][118][gc,cpu      ] GC(16) User=0.05s Sys=0.00s Real=0.02s

I need to create some fields out of the above single incoming log. So i am splitting by lines and trying to create fields with below filter:


filter {
	if [fields][hc_type] and [fields][hc_type] == "gc-log" 
	{
		mutate {
			copy => {
				"message" => "message_copy"
			}
		}
	
		split {field => "message_copy"}
	
		if "Eden" in [message_copy]
		{	
			mutate 
			{
				copy => 	
				{
					"message_copy" => "eden_copy"
				}
			}
	
		   grok 
			{
			match => ["eden_copy", "(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE:gcinfo}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) %{NOTSPACE:gcspace} %{NOTSPACE} %{NOTSPACE:beforegc}\-\>%{NOTSPACE:aftergc}\(%{NOTSPACE:total}\))"]
			}
		}
		Else if "Survivor" in [message_copy]
		{	
			mutate 
			{
				copy => 	
				{
					"message_copy" => "Survivor_copy"
				}
			}
	
		   grok 
			{
			match => ["Survivor_copy", "(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE:gcinfo}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) %{NOTSPACE:gcspace} %{NOTSPACE} %{NOTSPACE:beforegc}\-\>%{NOTSPACE:aftergc}\(%{NOTSPACE:total}\))"]
			}
		}
	}	
}

Able to create fields from a single line. Looking into below now:

If log is 10 lines, 10 different messages are getting created and fields from each line are getting created as a separate msg. How to get all fields from different lines into a single msg? Pls let me know

The key to this is to get rid of the split filter, and work with the original message. We can start off with the pattern from your existing grok filter, but make it more specific so that it only matches the Eden region, then add a very similar pattern that matches Survivor. That means we replaced the pattern that matches [gcspace] with a fixed string.

Eden/Survivior would both create fields called [aftergc] and [beforegc], so we need to rename them. We can tell grok to match every pattern even if one has already matched using the "break_on_match" option. Let's try

        break_on_match => false
        match => {
            message => [
                "(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE:gcinfo}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Eden regions: %{NOTSPACE:[heap][eden][beforegc]}\-\>%{NOTSPACE:[heap][eden][aftergc]}\(%{NOTSPACE:[heap][eden][total]}\))",
                "(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE:gcinfo}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Survivor regions: %{NOTSPACE:[heap][survivor][beforegc]}\-\>%{NOTSPACE:[heap][survivor][aftergc]}\(%{NOTSPACE:[heap][survivor][total]}\))"
            ]
        }

which gets us

     "timestamp" => [
    [0] "2024-02-14T03:29:08.387+0000",
    [1] "2024-02-14T03:29:08.387+0000"
],
        "id" => [
    [0] "118",
    [1] "118"
],
     "gcnum" => [
    [0] "16",
    [1] "16"
],
      "heap" => {
    "survivor" => {
        "beforegc" => "4",
         "aftergc" => "10",
           "total" => "39"
    },
        "eden" => {
        "beforegc" => "180",
         "aftergc" => "0",
           "total" => "297"
    }
},
"@timestamp" => 2024-09-20T14:15:39.653331628Z,
    "gcinfo" => [
    [0] "gc,heap",
    [1] "gc,heap"
]

The [timestamp], [id], and [gcnum] fields are going to be the same on every line (timestamp may vary insignificantly), so we will just keep one of those using

        overwrite => [ "gcnum", "id", "timestamp" ]

The [gcinfo] field helps you identify which lines in the log are which, but we are embedding that knowledge in the structure of the event ([heap][eden] etc.), so I do not think it adds value.

We can optimize the patterns a little by anchoring them using ^ at the beginning, so that they only match at start of line (and yes, in a multiline message you still have several starts of line). We can also replace NOTSPACE with NUMBER in some cases. This all speeds up failures to match.

So now we have

        break_on_match => false
        overwrite => [ "gcnum", "id", "timestamp" ]
        match => {
            message => [
                "^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Eden regions: %{NUMBER:[heap][eden][beforegc]}\-\>%{NUMBER:[heap][eden][aftergc]}\(%{NUMBER:[heap][eden][total]}\))",
                "^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Survivor regions: %{NUMBER:[heap][survivor][beforegc]}\-\>%{NUMBER:[heap][survivor][aftergc]}\(%{NUMBER:[heap][survivor][total]}\))"
            ]
        }

which gets us

 "timestamp" => "2024-02-14T03:29:08.387+0000",
        "id" => "118",
     "gcnum" => "16",
      "heap" => {
    "survivor" => {
        "beforegc" => "4",
         "aftergc" => "10",
           "total" => "39"
    },
        "eden" => {
        "beforegc" => "180",
         "aftergc" => "0",
           "total" => "297"
    }
},

You can continue adding patterns one by one to match every single line of the entire GC log entry. A little later I will post explaining how you can take another approach for most of the lines.

ETA: You can change %{NUMBER:[heap][eden][aftergc]} to %{NUMBER:[heap][eden][aftergc]:int} to make grok do the conversion of strings to integers.

1 Like

Sometimes you have several log entries that look very similar. Sometimes you cannot get a good outcome when parsing these using grok. For example, your entries like

[gc,heap     ] GC(16) Eden regions: 180->0(297)
[gc,heap     ] GC(16) Survivor regions: 4->10(39)

In this case grok can do it, but for the age table grok does not work well. Both can be handled using ruby with a regexp scan.

    ruby {
        code => '
            msg = event.get("message")

            matches = msg&.scan(/- age \s*(\d+):\s*(\d+) bytes,\s*(\d+) total$/)
            ages = []
            matches&.each_index { |i|
                ages[matches[i][0].to_i] = { "size" => matches[i][1].to_i, "total" => matches[i][2].to_i }
            }
            event.set("ages", ages)

            matches = msg&.scan(/GC\(\d+\) (\w+) regions: (\d+)->(\d+)(\((\d+)\))?$/)
            matches&.each_index { |i|
                region = matches[i][0]
                event.set("[regions][#{region}]", { "before" => matches[i][1].to_i, "after" => matches[i][2].to_i })
                if matches[i][4]
                    event.set("[regions][#{region}][total]", matches[i][4].to_i)
                end
            }
        '
    }

which will produce

   "regions" => {
         "Eden" => {
         "total" => 297,
        "before" => 180,
         "after" => 0
    },
          "Old" => {
        "before" => 20,
         "after" => 20
    },
    "Humongous" => {
        "before" => 3,
         "after" => 3
    },
....

and

      "ages" => [
    [ 0] nil,
    [ 1] {
         "size" => 13258944,
        "total" => 13258944
    },
    [ 2] {
         "size" => 1069200,
        "total" => 14328144
    },
    [ 3] {
         "size" => 754040,
        "total" => 15082184
    },
...
1 Like

Awesome Badger..I am able to get all the required data. Not so comfortable with ruby, so i am proceeding with below grok patterns only as i do not need age values.

filter 
{
	if [fields][hc_type] and [fields][hc_type] == "gc-log" 
	{
		grok 
		{
			break_on_match => false
			overwrite => [ "gcnum", "id", "timestamp" ]
			match => {
            message => [
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}Pre Evacuate Collection Set: %{NUMBER:preecstimeinms})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}Merge Heap Roots: %{NUMBER:mhrtimeinms})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}Evacuate Collection Set: %{NUMBER:ecstimeinms})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}Post Evacuate Collection Set: %{NUMBER:postecstimeinms})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}Other: %{NUMBER:othertimeinms})",
                "^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Eden regions: %{NUMBER:[heap][eden][beforegc]}\-\>%{NUMBER:[heap][eden][aftergc]}\(%{NUMBER:[heap][eden][total]}\))",
                "^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Survivor regions: %{NUMBER:[heap][survivor][beforegc]}\-\>%{NUMBER:[heap][survivor][aftergc]}\(%{NUMBER:[heap][survivor][total]}\))",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Old regions: %{NUMBER:[heap][old][beforegc]}\-\>%{NUMBER:[heap][old][aftergc]})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Archive regions: %{NUMBER:[heap][archive][beforegc]}\-\>%{NUMBER:[heap][archive][aftergc]})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Humongous regions: %{NUMBER:[heap][humongous][beforegc]}\-\>%{NUMBER:[heap][humongous][aftergc]})",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[%{NOTSPACE}%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\) Metaspace: %{NUMBER:[metaspace][peak]}K\(%{NUMBER:[metaspace][allocated]}K\))",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[gc%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}%{NOTSPACE}%{SPACE}%{NOTSPACE}%{SPACE}\(%{NOTSPACE:type}\)%{SPACE}\(G1 Evacuation Pause\)%{SPACE}%{NUMBER:heapbeforegc}M\-\>%{NUMBER:heapaftergc}M)",
				"^(\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{NUMBER:id}\]\[gc,cpu%{SPACE}\] %{NOTSPACE}\(%{NUMBER:gcnum}\)%{SPACE}User=%{NUMBER:usercpuinsec}s%{SPACE}Sys=%{NUMBER:syscpuinsec}s)"
		   ]
        }
		}
		
	}	
}

Thanks once again :pray: :pray: