CPU usage of Logstash too high when used with S3 Input plugin

Working on infrastructure built over AWS, since there are some special cases that logs are stored only in S3 buckets, if we want to use ELK to analyse these logs, we need to use 'S3 Input plugin' [1].

Unfortunately, the CPU usage of 'logstash' process is very, very high and also the Incoming network load [Network in] is very high, if we will take in count that we have 'sincedb_path' in use.

50

On the picture above, it is possible to see, that after I add S3 Access logs (they are from 5/7/2019 also analysed with/sent to ELK), the 'Network In' load is growing daily. To me it seems like the S3 input plugin is scanning whole bucket of S3 Access logs (logs of S3 Access log appears like one line in separate file - so there is quite huge number of files, in that S3 bucket).

Currently we are using 16x S3 input plugin in logstash.conf (in the input part) - please, see example of configuration below*.

input
{
    s3
    {
        bucket => "<bucket_name>"
        prefix => "production/lb/<path>/elasticloadbalancing/eu-west-1/"
        region => "eu-west-1"
        type => "alblogs"
        codec => plain
        sincedb_path => "/opt/<path>/elasticsearch/plugins/repository-s3/alblogs.txt"
        secret_access_key => "<secret>"
        access_key_id => "<access_key_id>"
    }

...

 filter {
   if [type] == "alblogs" {
      grok {
         match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} (?:%{IP:backend_ip}:%{NUMBER:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{NUMBER:elb_status_code:int}|-) (?:%{NUMBER:backend_status_code:int}|-) %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} \"(?:%{WORD:verb}|-) (?:%{GREEDYDATA:request}|-) (?:HTTP/%{NUMBER:httpversion}|-( )?)\" \"%{DATA:userAgent}\"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?"]
        match => [ "request", "%{UUID:event_uuid}" ]
      }

...

  if [type] == "s3_production" {
    grok {
        match => ["message", "%{NOTSPACE:s3_owner}[ \t](-|%{HOSTNAME:s3_bucket})[ \t]\[%{HTTPDATE:timestamp}\][ \t]%{IP:s3_remote_ip}[ \t]%{NOTSPACE:Requester}[ \t]%{NOTSPACE:RequesterID}[ \t]%{NOTSPACE:s3_operation}[ \t]%{NOTSPACE:s3_key}[ \t]%{NOTSPACE:request_method}[ \t]%{NOTSPACE:request_url}[ \t]%{NOTSPACE:request_protocol}[ \t]%{NUMBER:HTTP_status}[ \t]%{NOTSPACE:s3_errorCode}[ \t]%{NOTSPACE:s3_bytesSent}[ \t]%{NOTSPACE:s3_objectSize}[ \t]%{NUMBER:s3_totalTime}[ \t]%{NOTSPACE:s3_turnaroundTime}[ \t]\"%{NOTSPACE:Referrer}\"[ \t]\"%{GREEDYDATA:UserAgent}\"[ \t]%{NOTSPACE:s3_versionId}[ \t]%{NOTSPACE:s3_hostId}[ \t]%{NOTSPACE:s3_signarureVersion}[ \t]%{NOTSPACE:s3_cipherSuite}[ \t]%{NOTSPACE:s3_authType}[ \t]%{HOSTNAME:s3_hostHeader}[ \t]%{NOTSPACE:s3_TLSversion}"]

    add_tag => [ "production" ]
    tag_on_failure => [ "S3.EXPIRE.OBJECT" ]
    }
    mutate {
        remove_field => [ "message" ]
    }
  }

...

output
{

   if [type] == "alblogs" {
    elasticsearch {
        hosts => ["127.0.0.1:9200"]
        index => "alblogs-%{+YYYY.MM.dd}"
       }
   if [type] == "s3_production" {
    elasticsearch {
        hosts => ["127.0.0.1:9200"]
        index => "s3-production-%{+YYYY.MM.dd}"
       }
    }

Some advise, please? Is it normal? Could somebody please help?

*The parsing (grok) doesn't seems to be the issue. I tried to optimalize it as I could.

[1] https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html

Does this help?

1 Like

watch_for_new_files
Whether or not to watch for new files. Disabling this option causes the input to close itself after processing the files from a single listing.

When I was reading the documentation, I thought that I will lose the ability to fetch new files.

I didn't try it until now, and it helped, indeed!

I am fetching all data what I was fetching before, but the CPU usage of logstash process and Network In usage is way lower [from 2xx % and more, into 1x %].

Very last thing, what is bothering me, after this cool fix, is the fact that AWS is sending logs in bunches, let's say every 5 or 15 minutes in that time, I have peak of CPU usage still, is that possible to distribute this load somehow? Let's say, it is possible to do it in some small pieces?

Thank you for your help @Badger!

You could reduce the number of pipeline worker threads. Other than that, once it has work to do logstash will try to do it as quickly as possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.