Grok/logstash efficiency

if-meaton · August 14, 2018, 8:30pm

Hi Guys,

I'm wondering if the below logstash conf could be improved in anyway, on one of my logstash servers (a lower powered one) we get about 300 e/s and this server is pinned at 100% cpu and 60% memory when things are busy and I wondered if my grok patterns/other things could be improved?

gist.github.com

https://gist.github.com/if-meaton/f870f108d6ee7860f2e8e85d408a32f8

logstash.conf

input {
  beats {
    client_inactivity_timeout => 1200
    id => "{{ ansible_hostname }}"
    port => 5001
  }
}

filter {

This file has been truncated. show original

Thanks,

Michael

Badger · August 14, 2018, 8:33pm

Anchoring your grok patterns to start of line using ^ would help. Read this post.

if-meaton · August 14, 2018, 9:08pm

Thanks for that. I've added the start/end anchors and that seems to have sped things up although cpu is still pinned at 100%. Any other ideas on performance tuning?

Thanks,

Michael

Badger · August 14, 2018, 9:16pm

        match => {
        "message" => [
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Sequence:%{DATA:sequence}, Code:%{DATA:code}, ID:%{DATA:id}, AID:%{DATA:aid}, nToken:%{DATA:ntoken}, Method:%{DATA:method}, ExecutionTime:%{DATA:executiontime}ms, Response:%{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Response from \[%{IPORHOST:ip1}, %{IPORHOST:ip2}\] to \[%{URI:url}\] ExecutionTime:%{NUMBER:executiontime}ms(%{GREEDYDATA:logmessage})?",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Code:%{DATA:code}, ID:%{DATA:id}, rID:%{DATA:rid}, Channel:%{DATA:channel}, ID:%{DATA:aid}. %{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Code:%{DATA:code}, ID:%{DATA:id}, rID:%{DATA:rid}, ID:%{DATA:aid}. %{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - %{GREEDYDATA:logmessage}"

           ]
         }

Even if you anchor this, it still starts with a GREEDYDATA, so it will be very costly, because it will do a lot of back tracking. You have all those square brackets to delimit the data, so I would do that using dissect. Assuming the timestamp is two space-separated fields I would do something like

dissect { mapping => { "message" => "%{ts} %{+ts} [%{platform}] [%{app}] [%{cid}] [%{tid}] [%{loglevel}] [%{logsource}] - %{restOfLine}" } }

Then use anchored groks to match restOfLine.

if-meaton · August 14, 2018, 9:20pm

That particular pattern was given to me by the Elastic consultant that Elastic sent! But what you're saying makes sense. - I've added the anchors and i'm already seeing a massive improvement..

(Before and after the break is before and after i made the change and restarted logstash)

Thank you very much for your help, i'll look at your other suggestions too.

Thanks,

Michael

system · September 11, 2018, 9:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grok increase CPU usage to 700+% Logstash	3	612	December 23, 2020
Grok filter help! Logstash	5	581	September 15, 2018
High CPU on logstash cluster Logstash	5	898	August 22, 2017
How to improve below Logstash grok filters? Logstash	4	326	February 21, 2022
CPU usage for logstash hits over 300% Logstash	6	1199	December 23, 2020

Grok/logstash efficiency

Related topics