CPU usage for logstash hits over 300%

Hello everybody,

I am trying to parse logs using multiple grok filters. I have noticed that if i add following pattern, CPU usage for logstash hits over 300%. I have a couple of grok patterns for nginx logs as well without a problem - logstash cpu usage flows at average 10-30%. is it possible to modify this pattern in a way or alternative solution to overcome high cpu problem.
Grok Pattern:

        else if [log][file][path] =~ "tomcat8" {
 grok {
 match => { "message" => ["%{DATA:[tomcat][event][date]} %{DATA:[tomcat][event][time]}\s*\ %{DATA:[tomcat][event][level]} %{DATA:[tomcat][event][server]} \--- \[%{DATA:[tomcat][event][logger]}\] %{DATA:[tomcat][event][variable]}\s*\ \: %{GREEDYDATA:[tomcat][event][message]}"] }
  remove_field => "message"
      }
      }

Following is a sample log entry matches for this pattern:
message >
2020-11-10 03:39:15.979 INFO srv-prj-prod1 --- [veu-3108-exec-6] x.b.c.r.y.DceProfiler : execution-time: 256 ms - http://127.0.0.1:3108/privacyManagement/partyPrivacyProfile?partyPrivacyProfileCharacteristic[0].contactMedium.value - prod.backend.thirdparty

my logstash.yml file has no settings . All starts with #

I think Pipeline Settings can solve this. Can anyone redirect me to some page explains best practices for pipeline settings.
Thanks

# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
# pipeline.workers: 2
#
# How many events to retrieve from inputs before sending to filters+workers
#
# pipeline.batch.size: 125
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
# pipeline.batch.delay: 50
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# ------------ Pipeline Configuration Settings --------------

That pattern is extremely resource intensive. Having many DATA and GREEDYDATA grok patterns mixed together takes a lot of resources. You may want to look into Dissect: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html. Since you only have a singular pattern it would be much more efficient.

Patterns like DATA, and especially GREEDYDATA are very expensive when they do not match, since they result in a lot of backtracking.

Try to modify your pattern to avoid them. Perhaps you can use NOTSPACE or something else that is cheaper. Or use dissect.

Thanks for your suggestions!
I changed my grok into following format it looks ok in debug tool. i couldn't change last DATA at the end of pattern because of matching fails. This is a production environment so i dont have much chance to try out easily but will let you know the result when this change implemented. Btw, other suggestions would be greatly appreciated!

^%{NOTSPACE:[tomcat][event][date]} %{NOTSPACE:[tomcat][event][time]}\s*\ %{NOTSPACE:[tomcat][event][level]} %{NOTSPACE:[tomcat][event][server]} \--- \[%{NOTSPACE:[tomcat][event][logger]}\] %{NOTSPACE:[tomcat][event][variable]}\s*\ \: %{DATA:[tomcat][event][message]}$

it worked! To make change DATA with NOTSPACE solved high cpu problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.