Grok/logstash efficiency

Hi Guys,

I'm wondering if the below logstash conf could be improved in anyway, on one of my logstash servers (a lower powered one) we get about 300 e/s and this server is pinned at 100% cpu and 60% memory when things are busy and I wondered if my grok patterns/other things could be improved?



Anchoring your grok patterns to start of line using ^ would help. Read this post.

Thanks for that. I've added the start/end anchors and that seems to have sped things up although cpu is still pinned at 100%. Any other ideas on performance tuning?



        match => {
        "message" => [
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Sequence:%{DATA:sequence}, Code:%{DATA:code}, ID:%{DATA:id}, AID:%{DATA:aid}, nToken:%{DATA:ntoken}, Method:%{DATA:method}, ExecutionTime:%{DATA:executiontime}ms, Response:%{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Response from \[%{IPORHOST:ip1}, %{IPORHOST:ip2}\] to \[%{URI:url}\] ExecutionTime:%{NUMBER:executiontime}ms(%{GREEDYDATA:logmessage})?",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Code:%{DATA:code}, ID:%{DATA:id}, rID:%{DATA:rid}, Channel:%{DATA:channel}, ID:%{DATA:aid}. %{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Code:%{DATA:code}, ID:%{DATA:id}, rID:%{DATA:rid}, ID:%{DATA:aid}. %{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - %{GREEDYDATA:logmessage}"


Even if you anchor this, it still starts with a GREEDYDATA, so it will be very costly, because it will do a lot of back tracking. You have all those square brackets to delimit the data, so I would do that using dissect. Assuming the timestamp is two space-separated fields I would do something like

dissect { mapping => { "message" => "%{ts} %{+ts} [%{platform}] [%{app}] [%{cid}] [%{tid}] [%{loglevel}] [%{logsource}] - %{restOfLine}" } }

Then use anchored groks to match restOfLine.

That particular pattern was given to me by the Elastic consultant that Elastic sent! But what you're saying makes sense. - I've added the anchors and i'm already seeing a massive improvement..

(Before and after the break is before and after i made the change and restarted logstash)

Thank you very much for your help, i'll look at your other suggestions too.



This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.