How to improve below Logstash grok filters?

Hi there. I have a problem with Logstash consuming most of the CPU resources all the time and events not being parsed in real time. Through Stack Monitoring in Kibana, I found that for one set of filters for a certain source, the performance is low, hence causing the problem.
The filters have been written for Tibco BW 5.13 logs. Can anyone advise what can I do with below filters, what to change, what to avoid etc. to improve performance?

    grok {
      match => {
        "message" => [
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ})%{SPACE}BW.%{NOTSPACE:APPLICATION}%{SPACE}%{NOTSPACE:USER}%{SPACE}\[%{NOTSPACE:BW_USER}]%{SPACE}%{NOTSPACE}%{SPACE}\[%{DATESTAMP}]%{SPACE}%{NOTSPACE}%{SPACE}\[%{LOGLEVEL:LOG_LEVEL}]\[%{NOTSPACE:BW_PROCESS}]%{SPACE}\[%{JAVAFILE:PROCESS_ID}]\[%{JAVAFILE:EAR_NAME}]\[%{HOSTNAME}]%{GREEDYDATA:MSG}",
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ})%{SPACE}BW.%{NOTSPACE:APPLICATION}%{SPACE}%{NOTSPACE:USER}%{SPACE}\[%{NOTSPACE:BW_USER}]%{SPACE}%{NOTSPACE}%{SPACE}\[%{DATESTAMP}]%{SPACE}%{NOTSPACE}%{SPACE}\[%{LOGLEVEL:LOG_LEVEL}]\[%{NOTSPACE:BW_PROCESS}%{SPACE}%{NOTSPACE:BW_PROCESS}]%{SPACE}\[%{JAVAFILE:PROCESS_ID}]\[%{JAVAFILE:EAR_NAME}]\[%{HOSTNAME}]%{GREEDYDATA:MSG}",
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ})%{SPACE}BW.%{NOTSPACE:APPLICATION}%{SPACE}%{NOTSPACE:USER}%{SPACE}\[%{NOTSPACE:BW_USER}]%{SPACE}%{NOTSPACE}%{SPACE}%{NOTSPACE:JOB_ID}%{SPACE}\[%{NOTSPACE:BW_PROCESS}]:%{SPACE}\[%{DATA}]%{SPACE}\[%{NOTSPACE:JOB}]%{SPACE}\[%{LOGLEVEL:LOG_LEVEL}]%{GREEDYDATA:MSG}",
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ})%{SPACE}BW.%{NOTSPACE:APPLICATION}%{SPACE}%{NOTSPACE:USER}%{SPACE}\[%{NOTSPACE:BW_USER}]%{SPACE}%{NOTSPACE}%{SPACE}%{NOTSPACE:JOB}%{SPACE}\[%{DATA}]:%{SPACE}\[%{DATESTAMP}]%{SPACE}\[%{NOTSPACE}]%{SPACE}\[%{LOGLEVEL:LOG_LEVEL}]%{NOTSPACE:BW_PROCESS}%{SPACE}\[%{NOTSPACE}]%{GREEDYDATA:MSG}",
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ})%{SPACE}BW.%{NOTSPACE:APPLICATION}%{SPACE}%{LOGLEVEL:LOG_LEVEL}%{SPACE}\[%{USERNAME:BWUSER}]%{SPACE}%{GREEDYDATA:MSG}",
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ})%{SPACE}BW.%{NOTSPACE:APPLICATION}%{SPACE}%{NOTSPACE}%{SPACE}%{NOTSPACE:BWUSER}%{SPACE}%{NOTSPACE}%{SPACE}%{NOTSPACE:JOB}%{SPACE}%{NOTSPACE}:%{SPACE}%{SYSLOG5424SD:LOG_TMSTMP}%{SPACE}%{NOTSPACE}%{SPACE}\[%{LOGLEVEL:LOG_LEVEL}]\[%{NOTSPACE:BW_PROCESS}]%{SPACE}\[%{JAVAFILE:PROCESS_ID}]\[%{JAVAFILE:EAR_NAME}]\[%{HOSTNAME}]%{GREEDYDATA:MSG}",
          "(?<TMSTMP>%{YEAR} %{MONTH} %{MONTHDAY} %{TIME} %{WORD:TMZ} %{INT:TMZ}) %{GREEDYDATA:MSG}",
          "%{GREEDYDATA:MSG_NOT_FILTERED}"
        ]

If your messages start with TMSTMP then anchor them using ^. This makes a failure to match much cheaper. Read this blog post.

Besides the anchoring, putting the most matched patternss first will help to reduce the CPU usage.

You could also see if you can replace your grok with a couple of dissect filters and conditionals, this can reduce the CPU usage even more.

I've experimented with anchors before and while the performance was the same, it also caused many logs not to much without any other change to the filters so I decided not to use them. The blog also shows tiered matching strategy, I'm going to try that. Thanks for your reply!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.