Getting "_groktimeout" error for a particular filter sometimes. Need help optimizing the filter

The particular line is a "multiline". Hence using "codec => multiline" to get the line and then applying grok filter to it.
But sometimes when the particular line is having large content the parsing fails with "_groktimeout" error. logstash log file show WARN message like below.
Timeout executing grok '<grok filter>' against field 'message' with value 'Value too large to output (3138 bytes)! First 255 chars are: .....
Also, I see the cpu usage is 100% and java is using most of it.

I went through previous posts and do understand as my filter is a complicated one and has many DATA & GREEDYDATA patterns to match hence it might be taking more time. Need help with making a more efficient filter that returns me the same key-value pairs.

Below is my logstash config file.

input {
     file {
       type => "mylog"
       tags => "log"
	   path => [ "/Logs/mylog.log" ]
	   codec => multiline {
			pattern => "^\s"
			what => "previous"
	   }
       start_position => "beginning"
       sincedb_path => "NUL" 	   
     }
}
filter {
	 grok {
			  match => { "message" => "\[%{TIMESTAMP_ISO8601:log_timestamp}\] \[%{WORD:log_level}] \[tdid: %{NUMBER:tdid}\] \[userId: %{DATA:userId}\] \[myid: %{DATA:myid}\] \[%{TIMESTAMP_ISO8601:diagnostics_timestamp}\]\[p id:%{NUMBER:dpid}\]%{GREEDYDATA}(?<event_type>Start Data processing details)%{GREEDYDATA}Report path: (?<report_path>[^,\r\n]*)%{GREEDYDATA}Datamodel name: (?<datamodel_name>[^,\r\n]*)%{GREEDYDATA}User name: (?<user_name>[^,\r\n]*)%{GREEDYDATA}DB Session Module: (?<db_session_module>[^,\r\n]*)%{GREEDYDATA}DB Session Action: (?<db_session_action>[^,\r\n]*)%{GREEDYDATA}Total Number Of Rows Processed:(?<total_number_of_rows_processed>[^,\r\n]*)%{GREEDYDATA}Sql Action Id:(?<sql_action_id>[^,\r\n]*)%{GREEDYDATA}Total Number of Datasets: %{NUMBER:total_number_of_datasets}%{GREEDYDATA}Total Number of Elements: %{NUMBER:total_number_of_elements}%{GREEDYDATA}Total Parse Time: %{NUMBER:total_parse_time_ms}%{GREEDYDATA}Total Used Memory at Start: (?<total_used_memory_at_start>[^,\r\n]*)%{GREEDYDATA}Total Used Memory after Parsing: (?<total_used_memory_after_parsing>[^,\r\n]*)%{GREEDYDATA}Total Used Memory after Processing: (?<total_used_memory_after_processing>[^,\r\n]*)%{GREEDYDATA}Total Datamodel Validation Time: (?<total_datamodel_validation_time>[^,\r\n]*)%{GREEDYDATA}Total Xplan Generation Time: (?<total_xplan_generation_time>[^,\r\n]*)%{GREEDYDATA}Total Data Generation Time: (?<total_data_generation_time>[^,\r\n]*)%{GREEDYDATA}Total XML Data Size Generated: (?<total_xml_data_size_generated>[^,\r\n]*)%{GREEDYDATA}Processing Exceptions: (?<processing_exceptions>[^,\r\n]*)" }
      }
}

I've used "%{GREEDYDATA}" at multiple places because there could be no lines or multiple lines in between the patterns I'm trying to filter in the output.

Below is one particular matching line for which I got "_groktimeout" error.

"[2020-08-13T00:18:03.129+00:00] [INFO] [tdid: 136904] [userId: user1] [myid: 005f5ioH8qH3z035Rnh8id0003kz0001RV,1:33684] [2020-08-13T00:18:03.129+00:00][p id:1429221718][sch info:JobId:367679[RequestId:493608]][[
    Start Data processing details..
    ======================================================================
    processor Id: p{1429221718}
    Report path: /User/Human Capital Management/Interfaces/HCM_CLOUD_PROCESS
    Datamodel name: _User_Human_Capital_Management_Interfaces_HCM_CLOUD_PROCESS
    User name: user1
    DB Session Module: HCM_CLOUD_PROCESS
    DB Session Action: p{1429221718}{JobId:367679}
    key1: value key1
    key2: value key2
    key3: value key3
    key4: value key4
    key5: value key5
    key6: value key6
    key7: value key7
    key8: value key8
    key9: value key9
    key10: value key10
    key11: value key11
    key12: value key12
   .
   .
   .
   key_N: value key_N
        Total Number Of Times Query Executed:2
        Total Number Of Rows Processed:0
        Number of elements:0
        Total Query Execution time:1891
    ------------------------------------------------------------------------
    Total Number of Datasets: 5
    Total Number of Elements: 0
    Total Parse Time: 2
    Total Datamodel Validation Time: .000 mseconds
    Total Xplan Generation Time: .000 mseconds
    Total Data Generation Time: 2.000 seconds
    Total XML Data Size Generated: 1.03 KB
    Processing Exceptions: 
    ======================================================================
    End Data processing details..


"

I see no reason to use a single grok pattern to do this. You can match against multiple patterns. I have not tested this, but something like

grok {
    break_on_match => false
    match => {
        "message => [
            "Report path: (?<report_path>[^,\r\n]*)",
            "Datamodel name: (?<datamodel_name>[^,\r\n]*)",
            "User name: (?<user_name>[^,\r\n]*)",
            ...
        ]
    }
}

These patterns have fixed text at the beginning, so they will cheap to evaluate.

Thank You; it works.
I have few other problems because of this fix. Will create separate topic for that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.