Grok/logstash efficiency

Hi Guys,

I'm wondering if the below logstash conf could be improved in anyway, on one of my logstash servers (a lower powered one) we get about 300 e/s and this server is pinned at 100% cpu and 60% memory when things are busy and I wondered if my grok patterns/other things could be improved?

Thanks,

Michael

Anchoring your grok patterns to start of line using ^ would help. Read this post.

Thanks for that. I've added the start/end anchors and that seems to have sped things up although cpu is still pinned at 100%. Any other ideas on performance tuning?

Thanks,

Michael

        match => {
        "message" => [
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Sequence:%{DATA:sequence}, Code:%{DATA:code}, ID:%{DATA:id}, AID:%{DATA:aid}, nToken:%{DATA:ntoken}, Method:%{DATA:method}, ExecutionTime:%{DATA:executiontime}ms, Response:%{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Response from \[%{IPORHOST:ip1}, %{IPORHOST:ip2}\] to \[%{URI:url}\] ExecutionTime:%{NUMBER:executiontime}ms(%{GREEDYDATA:logmessage})?",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Code:%{DATA:code}, ID:%{DATA:id}, rID:%{DATA:rid}, Channel:%{DATA:channel}, ID:%{DATA:aid}. %{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - Code:%{DATA:code}, ID:%{DATA:id}, rID:%{DATA:rid}, ID:%{DATA:aid}. %{GREEDYDATA:logmessage}",
           "%{GREEDYDATA:time} \[%{DATA:platform}\] \[%{DATA:app}\] \[%{DATA:cid}\] \[%{DATA:tid}\] \[%{DATA:loglevel}\] \[%{DATA:logsource}\] - %{GREEDYDATA:logmessage}"

           ]
         }

Even if you anchor this, it still starts with a GREEDYDATA, so it will be very costly, because it will do a lot of back tracking. You have all those square brackets to delimit the data, so I would do that using dissect. Assuming the timestamp is two space-separated fields I would do something like

dissect { mapping => { "message" => "%{ts} %{+ts} [%{platform}] [%{app}] [%{cid}] [%{tid}] [%{loglevel}] [%{logsource}] - %{restOfLine}" } }

Then use anchored groks to match restOfLine.

That particular pattern was given to me by the Elastic consultant that Elastic sent! But what you're saying makes sense. - I've added the anchors and i'm already seeing a massive improvement..

(Before and after the break is before and after i made the change and restarted logstash)

Thank you very much for your help, i'll look at your other suggestions too.

Thanks,

Michael