Grok filter help!

Hi, I am having a hard time getting my logstash pipeline to process messages efficiently.

I am running at 100% CPU utilization (4 CPUs and 4 pipeline workers) and only processing about 20-30 documents a minute (measured using the metrics plugin). /etc/logstash/Test/testlog.log is just a 1gb file of about 9 common log lines repeated. Memory usage remains low (8gb available)

I am unsure at this point whether or not the number of CPUs is the issue or if it is the grok filters I am using.

Using only %{GREEDYDATA:logmessage}" as the grok filter it shoots up to over 15k every minute so I am more inclined to think that I am using the grok filters incorrectly.

If anyone is able to spot a way i can improve the grok filter I'd be a very happy chappy :slight_smile:
I can also try throwing more CPUs at it however I want to be sure I am not doing something silly with my filtering first.

My config:

input {
    file {
        path => [ "/etc/logstash/Test/testlog.log" ]
        start_position  => "beginning"
        sincedb_path => "/dev/null"

filter {
    metrics {
        meter => "documents"
        add_tag => "metric"

    grok {
        match => [
            "message", "%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime} %{INT:SeqNo} %{INT:Level} %{DATA:NetworkAddr} %{WORD:HostName}   %{WORD:AppName}   %{INT:DiscardCount} %{INT:heartbeat} %{INT:Flags} %{WORD:OperationType}: %{UUID:Id}: %{GREEDYDATA:logmessage}",

            "message", "%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime} %{INT:SeqNo} %{INT:Level} %{DATA:NetworkAddr} %{WORD:HostName}   %{WORD:AppName}   %{INT:DiscardCount} %{INT:heartbeat} %{INT:Flags} %{WORD:OperationType}: %{NUMBER:mstaken}ms %{UUID:Id}: %{GREEDYDATA:logmessage}",

            "message", "%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime} %{INT:SeqNo} %{INT:Level} %{DATA:NetworkAddr} %{WORD:HostName}   %{WORD:AppName}   %{INT:DiscardCount} %{INT:heartbeat} %{INT:Flags} %{GREEDYDATA:logmessage}",

            "message", "(?<logmessage>Soss.*)"

output {
    if "_grokparsefailure" in [tags] {
        file { path => "/tmp/grok_errors.log" }
    if "metric" in [tags] {
        stdout {
            codec => line {
                format => "1m rate: %{[documents][rate_1m]} ( %{[documents][count]} )"

Thank you!

If you are trying to match the entire line, then it will help a lot to anchor your grok patterns to the start of line using ^. For example

"message", "^%{INT:Version} %{DATA:EventTime} %{DATA:LoggedTime}...

If you do not anchor, it will try to match the pattern at the first character of the line. If that fails it try matching the pattern starting at the second character. If you have a 100 character line it will try the pattern from 100 different positions. If you anchor it to start of line it tries the pattern once. That can significantly speed things up.

See the elastic blog post for more detail.

1 Like

ha! you legend!

That one little character has bumped it up to 900/min and is still climbing


I totally agree with Badger. Friend you don't stay, Grok filter will attempt to coordinate the example at the principal character of the line. In the event that that comes up short, Grok filter help takes a stab at coordinating the example beginning with Essay Writing the second character. On the off chance that you have a 100 character line it will attempt the example from 100 unique positions. On the off chance that you stay it to the beginning of the line it attempts the example once. That can altogether speed things up.

Grok Filter information change and standardization in Logstash is performed utilizing channel modules. This article centers around a standout amongst the most prevalent and valuable channel modules - Logstash Grok Filter dissertation help. Grok Filter is utilized to parse unstructured information into organized information influencing it to prepared for collection and investigation in the ELK.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.