Logstash using 100% of one CPU core

Hi,

I'm using Logstash to parse two log files and use multiline filter of then.
I experienced high CPU usage (100% of one CPU core) but on my log a have only this message:

tail -n40 -f /var/log/logstash/logstash.log 
{:timestamp=>"2016-03-11T16:33:11.144000-0300", :message=>"Defaulting pipeline worker threads to 1 because there are some filters that might not work with multiple worker threads", :count_was=>4, :filters=>["multiline"], :level=>:warn}

And follow my config file:

input { 
    file { 
        path => "/appdata/logs/passaporte/passaporte.log" 
        type => 'log_passaporte' 
    }   
    file {
        path => "/appdata/logs/passaporte/auditoria.log"
        type => 'log_auditoria'
    }
} 
filter{ 

    if [type] == "log_passaporte" {
        multiline {
            pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)"
            what => "previous"
        }           
        
        grok {
            match => [ "message", "%{DATESTAMP:data} \[%{USERNAME:thread}\] %{WORD:level} %{GREEDYDATA:classe} \(%{GREEDYDATA:token}\) \(%{GREEDYDATA:metodo}\) \(%{GREEDYDATA:url}\) \(%{GREEDYDATA:parametros}\) \(%{GREEDYDATA:ip}\) \(%{NUMBER:usuarioId}\) \(%{GREEDYDATA:usuarioNome}\) \(%{GREEDYDATA:grupoId}\) \(%{GREEDYDATA:sistemaEnsino}\) \(%{GREEDYDATA:payload}\) \(%{GREEDYDATA:protocolo}\) \(%{GREEDYDATA:aplicacao}\) \(%{GREEDYDATA:autenticacao}\) \(%{NUMBER:tempoRequest}\) \(%{GREEDYDATA:login}\) \(%{GREEDYDATA:grupoNome}\) \(%{GREEDYDATA:grupoSigla}\) \(%{GREEDYDATA:tipoUsuario}\) \(%{GREEDYDATA:email}\) \(%{GREEDYDATA:idExterno}\) \(%{GREEDYDATA:hash}\) \(%{GREEDYDATA:classificacoes}\)" ]
        }

        mutate {
            convert => {
                tempoRequest => "integer"
            }
        }
    }  

    if [type] == "log_auditoria" {

        grok {
            match => [ "message", "%{DATESTAMP:data} \[%{USERNAME:thread}\] %{WORD:level} %{GREEDYDATA:classe} \(%{GREEDYDATA:token}\) \(%{GREEDYDATA:metodo}\) \(%{GREEDYDATA:url}\) \(%{GREEDYDATA:parametros}\) \(%{GREEDYDATA:ip}\) \(%{NUMBER:usuarioId}\) \(%{GREEDYDATA:usuarioNome}\) \(%{GREEDYDATA:grupoId}\) \(%{GREEDYDATA:sistemaEnsino}\) \(%{GREEDYDATA:payload}\) \(%{GREEDYDATA:protocolo}\) \(%{GREEDYDATA:aplicacao}\) \(%{GREEDYDATA:autenticacao}\) \(%{NUMBER:tempoRequest}\) \(%{GREEDYDATA:login}\) \(%{GREEDYDATA:grupoNome}\) \(%{GREEDYDATA:grupoSigla}\) \(%{GREEDYDATA:tipoUsuario}\) \(%{GREEDYDATA:email}\) \(%{GREEDYDATA:idExterno}\) \(%{GREEDYDATA:hash}\) \(%{GREEDYDATA:classificacoes}\) \(%{GREEDYDATA:result}\)" ]
        }

        mutate {
            convert => {
                tempoRequest => "integer"
            }
        }
    }
} 
output {
    stdout {
        codec => rubydebug
    }  
    if "_grokparsefailure" not in [tags] {
        elasticsearch { 
            hosts => ["10.32.136.76:9200", "10.32.136.77:9200", "10.32.136.78:9200", "10.32.136.79:9200"]
         } 
    }
}

I guess my filter has poor performance for some reason, bug or the way it was configured.
Can someone point to me how I can debug to find whats is causing this bad performance?

Cheers!

Thats why. You can't use multiple threads with multiline as you may end up with disordered events.

So since Logstash forces pipeline worker threads to 1 because of the use of multiline configuration, is there a recommended architecture/configuration to use for consuming multiline log4j logfiles? I use filebeat to send the logs to an AWS ELB that faces 2 (and eventually more) Logstash servers. The logstash servers are 4 CPU 16GB mem servers and using only 1 of the 4 CPUs is not a cost-effective or scalable model. Please advise.

Multiline processing should ideally be done as close to the source as possible. The latest version of Filebeat has support for multiline processing built in, allowing this to be done before that data reaches Logstash. Another option is to use the multiline codec plugin in Logstash as this does not require a single worker thread to be used.

Hi All,

After moving the multiline in my test environment. Logs were getting multiple faster(atleast 4x times). Yet to put in production. Have you tried this and is there any positive update in CPU improvement of logstash.