Grok filter help!

Inammathe_Inna · July 17, 2018, 12:16am

Hi, I am having a hard time getting my logstash pipeline to process messages efficiently.

I am running at 100% CPU utilization (4 CPUs and 4 pipeline workers) and only processing about 20-30 documents a minute (measured using the metrics plugin). /etc/logstash/Test/testlog.log is just a 1gb file of about 9 common log lines repeated. Memory usage remains low (8gb available)

I am unsure at this point whether or not the number of CPUs is the issue or if it is the grok filters I am using.

Using only %{GREEDYDATA:logmessage}" as the grok filter it shoots up to over 15k every minute so I am more inclined to think that I am using the grok filters incorrectly.

If anyone is able to spot a way i can improve the grok filter I'd be a very happy chappy
I can also try throwing more CPUs at it however I want to be sure I am not doing something silly with my filtering first.

My config:

input {
    file {
        path => [ "/etc/logstash/Test/testlog.log" ]
        start_position  => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {
    
    metrics {
        meter => "documents"
        add_tag => "metric"
    }

    grok {
        match => [
            "message", "%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime} %{INT:SeqNo} %{INT:Level} %{DATA:NetworkAddr} %{WORD:HostName}   %{WORD:AppName}   %{INT:DiscardCount} %{INT:heartbeat} %{INT:Flags} %{WORD:OperationType}: %{UUID:Id}: %{GREEDYDATA:logmessage}",

            "message", "%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime} %{INT:SeqNo} %{INT:Level} %{DATA:NetworkAddr} %{WORD:HostName}   %{WORD:AppName}   %{INT:DiscardCount} %{INT:heartbeat} %{INT:Flags} %{WORD:OperationType}: %{NUMBER:mstaken}ms %{UUID:Id}: %{GREEDYDATA:logmessage}",

            "message", "%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime} %{INT:SeqNo} %{INT:Level} %{DATA:NetworkAddr} %{WORD:HostName}   %{WORD:AppName}   %{INT:DiscardCount} %{INT:heartbeat} %{INT:Flags} %{GREEDYDATA:logmessage}",

            "message", "(?<logmessage>Soss.*)"
        ]
    }
}

output {
    if "_grokparsefailure" in [tags] {
        file { path => "/tmp/grok_errors.log" }
    }
    if "metric" in [tags] {
        stdout {
            codec => line {
                format => "1m rate: %{[documents][rate_1m]} ( %{[documents][count]} )"
            }
        }
    }
}

Thank you!

Badger · July 17, 2018, 12:35am

If you are trying to match the entire line, then it will help a lot to anchor your grok patterns to the start of line using ^. For example

"message", "^%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime}...

If you do not anchor, it will try to match the pattern at the first character of the line. If that fails it try matching the pattern starting at the second character. If you have a 100 character line it will try the pattern from 100 different positions. If you anchor it to start of line it tries the pattern once. That can significantly speed things up.

See the elastic blog post for more detail.

Inammathe_Inna · July 17, 2018, 1:04am

ha! you legend!

That one little character has bumped it up to 900/min and is still climbing

THANKYOU!

russellrexroad · August 4, 2018, 7:48am

I totally agree with Badger. Friend you don't stay, Grok filter will attempt to coordinate the example at the principal character of the line. In the event that that comes up short, Grok filter help takes a stab at coordinating the example beginning with Essay Writing the second character. On the off chance that you have a 100 character line it will attempt the example from 100 unique positions. On the off chance that you stay it to the beginning of the line it attempts the example once. That can altogether speed things up.

Elizabetholsen · August 18, 2018, 4:55am

Grok Filter information change and standardization in Logstash is performed utilizing channel modules. This article centers around a standout amongst the most prevalent and valuable channel modules - Logstash Grok Filter dissertation help. Grok Filter is utilized to parse unstructured information into organized information influencing it to prepared for collection and investigation in the ELK.

system · September 15, 2018, 4:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Abnormally CPU utilization by Logstash Logstash	4	357	November 22, 2019
How to improve below Logstash grok filters? Logstash	4	308	February 21, 2022
CPU usage for logstash hits over 300% Logstash	6	1129	December 23, 2020
Optimize grok filter Logstash	8	1400	September 13, 2019
Logstash using 100% of one CPU core Logstash	5	3452	July 6, 2017

Grok filter help!

Related topics