Logstash using 100% of one CPU core

ruda_porto · March 11, 2016, 8:18pm

Hi,

I'm using Logstash to parse two log files and use multiline filter of then.
I experienced high CPU usage (100% of one CPU core) but on my log a have only this message:

tail -n40 -f /var/log/logstash/logstash.log 
{:timestamp=>"2016-03-11T16:33:11.144000-0300", :message=>"Defaulting pipeline worker threads to 1 because there are some filters that might not work with multiple worker threads", :count_was=>4, :filters=>["multiline"], :level=>:warn}

And follow my config file:

input { 
    file { 
        path => "/appdata/logs/passaporte/passaporte.log" 
        type => 'log_passaporte' 
    }   
    file {
        path => "/appdata/logs/passaporte/auditoria.log"
        type => 'log_auditoria'
    }
} 
filter{ 

    if [type] == "log_passaporte" {
        multiline {
            pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
            what => "previous"
        }           
        
        grok {
            match => [ "message", "%{DATESTAMP:data} \[%{USERNAME:thread}\] %{WORD:level} %{GREEDYDATA:classe} \(%{GREEDYDATA:token}\) \(%{GREEDYDATA:metodo}\) \(%{GREEDYDATA:url}\) \(%{GREEDYDATA:parametros}\) \(%{GREEDYDATA:ip}\) \(%{NUMBER:usuarioId}\) \(%{GREEDYDATA:usuarioNome}\) \(%{GREEDYDATA:grupoId}\) \(%{GREEDYDATA:sistemaEnsino}\) \(%{GREEDYDATA:payload}\) \(%{GREEDYDATA:protocolo}\) \(%{GREEDYDATA:aplicacao}\) \(%{GREEDYDATA:autenticacao}\) \(%{NUMBER:tempoRequest}\) \(%{GREEDYDATA:login}\) \(%{GREEDYDATA:grupoNome}\) \(%{GREEDYDATA:grupoSigla}\) \(%{GREEDYDATA:tipoUsuario}\) \(%{GREEDYDATA:email}\) \(%{GREEDYDATA:idExterno}\) \(%{GREEDYDATA:hash}\) \(%{GREEDYDATA:classificacoes}\)" ]
        }

        mutate {
            convert => {
                tempoRequest => "integer"
            }
        }
    }  

    if [type] == "log_auditoria" {

        grok {
            match => [ "message", "%{DATESTAMP:data} \[%{USERNAME:thread}\] %{WORD:level} %{GREEDYDATA:classe} \(%{GREEDYDATA:token}\) \(%{GREEDYDATA:metodo}\) \(%{GREEDYDATA:url}\) \(%{GREEDYDATA:parametros}\) \(%{GREEDYDATA:ip}\) \(%{NUMBER:usuarioId}\) \(%{GREEDYDATA:usuarioNome}\) \(%{GREEDYDATA:grupoId}\) \(%{GREEDYDATA:sistemaEnsino}\) \(%{GREEDYDATA:payload}\) \(%{GREEDYDATA:protocolo}\) \(%{GREEDYDATA:aplicacao}\) \(%{GREEDYDATA:autenticacao}\) \(%{NUMBER:tempoRequest}\) \(%{GREEDYDATA:login}\) \(%{GREEDYDATA:grupoNome}\) \(%{GREEDYDATA:grupoSigla}\) \(%{GREEDYDATA:tipoUsuario}\) \(%{GREEDYDATA:email}\) \(%{GREEDYDATA:idExterno}\) \(%{GREEDYDATA:hash}\) \(%{GREEDYDATA:classificacoes}\) \(%{GREEDYDATA:result}\)" ]
        }

        mutate {
            convert => {
                tempoRequest => "integer"
            }
        }
    }
} 
output {
    stdout {
        codec => rubydebug
    }  
    if "_grokparsefailure" not in [tags] {
        elasticsearch { 
            hosts => ["10.32.136.76:9200", "10.32.136.77:9200", "10.32.136.78:9200", "10.32.136.79:9200"]
         } 
    }
}

I guess my filter has poor performance for some reason, bug or the way it was configured.
Can someone point to me how I can debug to find whats is causing this bad performance?

Cheers!

warkolm · March 11, 2016, 11:19pm

Thats why. You can't use multiple threads with multiline as you may end up with disordered events.

gcherneski · May 2, 2016, 6:31pm

So since Logstash forces pipeline worker threads to 1 because of the use of multiline configuration, is there a recommended architecture/configuration to use for consuming multiline log4j logfiles? I use filebeat to send the logs to an AWS ELB that faces 2 (and eventually more) Logstash servers. The logstash servers are 4 CPU 16GB mem servers and using only 1 of the 4 CPUs is not a cost-effective or scalable model. Please advise.

Christian_Dahlqvist · May 2, 2016, 6:46pm

Multiline processing should ideally be done as close to the source as possible. The latest version of Filebeat has support for multiline processing built in, allowing this to be done before that data reaches Logstash. Another option is to use the multiline codec plugin in Logstash as this does not require a single worker thread to be used.

santhosh244 · April 16, 2017, 6:49am

Hi All,

After moving the multiline in my test environment. Logs were getting multiple faster(atleast 4x times). Yet to put in production. Have you tried this and is there any positive update in CPU improvement of logstash.

Topic		Replies	Views
Grok filter help! Logstash	5	526	September 15, 2018
Logstash cpu utilization is 100% Logstash	1	871	August 11, 2017
Worker thread consumes 100 % CPU (for no reason) Logstash	5	15028	November 4, 2022
Defaulting pipeline worker threads to 1 Logstash	2	2215	January 5, 2017
High cpu usage by worker thread Logstash	8	3999	July 6, 2017

Logstash using 100% of one CPU core

Related topics