I am trying to parse logs using multiple grok filters. I have noticed that if i add following pattern, CPU usage for logstash hits over 300%. I have a couple of grok patterns for nginx logs as well without a problem - logstash cpu usage flows at average 10-30%. is it possible to modify this pattern in a way or alternative solution to overcome high cpu problem.
Grok Pattern:
my logstash.yml file has no settings . All starts with #
I think Pipeline Settings can solve this. Can anyone redirect me to some page explains best practices for pipeline settings.
Thanks
# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
# pipeline.workers: 2
#
# How many events to retrieve from inputs before sending to filters+workers
#
# pipeline.batch.size: 125
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
# pipeline.batch.delay: 50
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# ------------ Pipeline Configuration Settings --------------
That pattern is extremely resource intensive. Having many DATA and GREEDYDATA grok patterns mixed together takes a lot of resources. You may want to look into Dissect: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html. Since you only have a singular pattern it would be much more efficient.
Thanks for your suggestions!
I changed my grok into following format it looks ok in debug tool. i couldn't change last DATA at the end of pattern because of matching fails. This is a production environment so i dont have much chance to try out easily but will let you know the result when this change implemented. Btw, other suggestions would be greatly appreciated!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.