Logstash 6.2.4 high CPU usage


(Vitaliy Zhiltsov) #1

Hello. Since we've updated logstash from 5.4.2 to 6.2.4 version we have problem with high using of CPU.
logstash_cpu_last24
As you can see on screenshots this high usage is not depend on message rate:
logstash_kibana_last24
All config files are the same. We use your original docker images.
With INFO log level everything looks fine but log with DEBUG level contains a lot of next messages:

[2018-07-02T09:36:38,928][DEBUG][logstash.config.source.local.configpathloader] Skipping the following files while reading config since they don't match the specified glob pattern {:files=>["/usr/share/logstash/CONTRIBUTORS", "/usr/share/logstash/Gemfile", "/usr/share/logstash/Gemfile.lock", "/usr/share/logstash/LICENSE", "/usr/share/logstash/NOTICE.TXT", "/usr/share/logstash/bin", "/usr/share/logstash/config", "/usr/share/logstash/data", "/usr/share/logstash/lib", "/usr/share/logstash/logstash-core", "/usr/share/logstash/logstash-core-plugin-api", "/usr/share/logstash/modules", "/usr/share/logstash/pipeline", "/usr/share/logstash/tools", "/usr/share/logstash/vendor"]}
[2018-07-02T09:36:38,928][DEBUG][logstash.config.source.local.configpathloader] Reading config file {:config_file=>"/usr/share/logstash/pipeline/logstash.conf"}
[2018-07-02T09:36:38,930][DEBUG][logstash.agent           ] Converging pipelines state {:actions_count=>0}

Less then one minute of log almost 2,5Mb. I'm not sure should I upload it fully or not.

$ cat ./logstash.conf

input {
  beats {
    port => 5044
  }
}
filter {
  if [source] =~ "services" {
    grok {
      match => ["source", "/%{GREEDYDATA}/%{DATA:ServiceName}\_%{GREEDYDATA}\_%{DATA:ServiceVersion}\.0.log"]
      break_on_match => false
      tag_on_failure => [ "failedPattern3" ]
    }
    grok {
      match => ["message", "%{TIMESTAMP_ISO8601:timestamp}\|%{DATA:TraceID}\|%{DATA:EventID:int}\|%{DATA:Logger}\|%{LOGLEVEL:LogLevel}\|%{GREEDYDATA:Message}"]
      tag_on_failure => [ "failedPattern2" ]
    }
    date {
      timezone => "UTC"
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss.SSSS" ]
      target => "@timestamp"
      remove_field => [ "timestamp" ]
    }
  } else if [source] =~ "docker" {
    mutate {
      add_field => {
        ServiceName => "docker"
        ServiceVersion => "0"
        LogLevel => "INFO"
      }
    }
    grok {
      match => ["message", "%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:monthday}T%{TIME:time}Z %{DATA:Logger1} %{DATA:Logger2} %{DATA:TraceID} %{GREEDYDATA:Message}"]
      tag_on_failure => [ "failedPattern4" ]
    }
    mutate {
      add_field => {
        "timestamp" => "%{year}-%{month}-%{monthday} %{time}"
      }
    }
    date {
      locale => "en"
      timezone => "UTC"
      match => ["timestamp", "yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss.SSSSSSSSS", "ISO8601"]
      target => "@timestamp"
      remove_field => ["timestamp", "monthday", "year", "month", "day", "time"]
    }
    mutate {
      add_field => {
        "Logger" => "%{Logger1} %{Logger2}"
      }
      remove_field => [ "Logger2", "Logger1" ]
    }
  } else {
    mutate {
      add_field => {
        ServiceName => "rabbitmq"
        ServiceVersion => "3.6.10"
      }
      gsub => ["message", "==\n", " "]
    }
    grok {
      match => ["message", "=%{LOGLEVEL:LogLevel} REPORT==== %{MONTHDAY:monthday}-%{MONTH:month}-%{YEAR:year}::%{TIME:time} = %{GREEDYDATA:Message}"]
      tag_on_failure => [ "failedPattern1" ]
    }
    mutate {
      add_field => {
        "timestamp" => "%{monthday}/%{month}/%{year}:%{time}"
      }
    }
    date {
      locale => "en"
      timezone => "UTC"
      match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss"]
      target => "@timestamp"
      remove_field => ["timestamp", "monthday", "year", "month", "day", "time"]
    }
  }
  if "beats_input_codec_plain_applied" in [tags] {
#    mutate {
#      remove_tag => ["beats_input_codec_plain_applied"]
#    }
    mutate {
      replace => ["@version", "%{ServiceVersion}"]
      remove_field => [ "tags", "message",  "host", "type", "input_type", "offset", "beat", "source", "ServiceVersion" ]
      remove_field => [ '[geoip][ip]', '[geoip][location]', '[geoip][latitude]', '[geoip][longitude]' ]
    }
  }
}
output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
  }
  # Remove in production
#  stdout {
#    codec => rubydebug
#  }
}

(Christian Dahlqvist) #2

You are setting the @timestamp field based in data in the events, so the distribution in the Discover application may not accurately reflect what is actually processed during that time frame.

One thing that stands out is that you are using a lot of DATA and GREEDYDATA patterns, which can be quite inefficient. See this blog post for advice on how to tune grok expressions for performance.

I would also recommend installing monitoring so you can see what Logstash is doing and what it is spending most time on.


(Vitaliy Zhiltsov) #3

Thanks for tips. It'll be part of optimizations and improving of speed of whole system.
But as I early said settings are the same and we've changed version of docker image only.
Today I ran copy of our logstash container on the same machine with other port for listening. So with this configuration logstash1 container doesn't get any messages from a filebeat service but still uses CPU:

$ sudo docker stats --no-stream
CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
12b6e6da5775        logstash1           77.46%              552.4MiB / 6.795GiB   7.94%               1.6kB / 1.1kB       1.31MB / 11.1MB     35
711a8364c383        logstash            89.64%              615MiB / 6.795GiB     8.84%               19.9kB / 41.8kB     6.08MB / 11.1MB     41

$cat docker-compose.yaml

version: "2"
services:
  logstash1:
    container_name: logstash1
    image: docker.elastic.co/logstash/logstash:6.2.4
    ports:
      - "5043:5044"
    networks:
      - elk
    volumes:
      - ./logstash/logstash.conf:/usr/share/logstash/pipeline/logstash.conf
      - ./logstash/logstash.yml:/usr/share/logstash/config/logstash.yml
networks:
  elk:
    external: true

$cat ./logstash.yml

config.reload.automatic: true
config.reload.interval: 5
http.host: 0.0.0.0
log.level: debug
path.config: /usr/share/logstash/pipeline
pipeline.workers: 1
xpack.monitoring.elasticsearch.password: changeme
xpack.monitoring.elasticsearch.url: http://elasticsearch:9200
xpack.monitoring.elasticsearch.username: logstash_system
xpack.monitoring.enabled: false

When I ran this config on my local machine which doesn't have any other containers (filebeat, elasticsearch, kibana) I got the same effect:

test@ubuntu1:~/elk$ sudo docker stats --no-stream
CONTAINER ID        NAME                                           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
e9d1f7757a2c        logstash1                                      99.27%              470.1MiB / 1.945GiB   23.61%              2.06kB / 716B       0B / 18.9MB         34

(Vitaliy Zhiltsov) #4

I found it!
From my point of view it's breaking change from you!
https://www.elastic.co/guide/en/logstash/5.4/logstash-settings-file.html

config.reload.interval - How often in seconds Logstash checks the config files for changes. - 3

https://www.elastic.co/guide/en/logstash/6.3/logstash-settings-file.html

config.reload.interval - How often in seconds Logstash checks the config files for changes. - 3s


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.