Filebeat Memory Leakage Investigation

ademmers · April 20, 2017, 8:41am

Hi,

Filebeat version 5.2.0
OS: Centos 6.6 and 7.2
Configuration (cfr second post)

On our production environment I'm investigating memory leaking in the filebeat process.

I've gotten so far to deduce (using go tool pprof) that the extra memory consumption is all located in harvester.Harvest function, in big blobs (see list output below).

Might this be a config "error"
How would I proceed to investigate the root cause ?

Kind regards,

Arne

pprof output:

$ go tool pprof <exe_path> 
http://<pprof_listener>/debug/pprof/heap
Fetching profile from http://<pprof_listener>/debug/pprof/heap
Saved profile in <profile_save_path>
Entering interactive mode (type "help" for commands)
(pprof) top
967.37MB of 972.89MB total (99.43%)
Dropped 250 nodes (cum <= 4.86MB)
Showing top 10 nodes out of 21 (cum >= 28.53MB)
      flat  flat%   sum%        cum   cum%
  917.22MB 94.28% 94.28%   917.22MB 94.28%  github.com/elastic/beats/filebeat/harvester.(*Harvester).Harvest
   23.80MB  2.45% 96.72%    28.03MB  2.88%  compress/flate.NewWriter
   17.10MB  1.76% 98.48%    17.10MB  1.76%  github.com/elastic/beats/filebeat/spooler.(*Spooler).flush
    8.75MB   0.9% 99.38%     8.75MB   0.9%  github.com/elastic/beats/vendor/github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
    0.50MB 0.052% 99.43%    28.53MB  2.93%  github.com/elastic/beats/vendor/github.com/Shopify/sarama.encode
         0     0% 99.43%    28.03MB  2.88%  compress/gzip.(*Writer).Write
         0     0% 99.43%   917.22MB 94.28%  github.com/elastic/beats/filebeat/prospector.(*Prospector).startHarvester.func1
         0     0% 99.43%    17.10MB  1.76%  github.com/elastic/beats/filebeat/spooler.(*Spooler).run
         0     0% 99.43%    28.53MB  2.93%  github.com/elastic/beats/vendor/github.com/Shopify/sarama.(*Broker).Produce
         0     0% 99.43%    28.53MB  2.93%  github.com/elastic/beats/vendor/github.com/Shopify/sarama.(*Broker).send
(pprof) list Harvest
Total: 972.89MB
ROUTINE ======================== github.com/elastic/beats/filebeat/harvester.(*Harvester).Harvest in /home/ademmers/gopath/src/github.com/elastic/beats/filebeat/harvester/log.go
  917.22MB   917.22MB (flat, cum) 94.28% of Total
         .          .     88:           case <-h.done:
         .          .     89:                   return
         .          .     90:           default:
         .          .     91:           }
         .          .     92:
         .          .     93:           message, err := r.Next()
         .          .     94:           if err != nil {
         .          .     95:                   switch err {
         .          .     96:                   case ErrFileTruncate:
         .          .     97:                           logp.Info("File was truncated. Begin reading file from offset 0: %s", h.state.Source)
         .          .     98:                           h.state.Offset = 0
         .          .     99:                           filesTruncated.Add(1)
         .          .    100:                   case ErrRemoved:
         .          .    101:                           logp.Info("File was removed: %s. Closing because close_removed is enabled.", h.state.Source)
         .          .    102:                   case ErrRenamed:
         .          .    103:                           logp.Info("File was renamed: %s. Closing because close_renamed is enabled.", h.state.Source)
         .          .    104:                   case ErrClosed:
         .          .    105:                           logp.Info("Reader was closed: %s. Closing.", h.state.Source)
         .          .    106:                   case io.EOF:
         .          .    107:                           logp.Info("End of file reached: %s. Closing because close_eof is enabled.", h.state.Source)
         .          .    108:                   case ErrInactive:
         .          .    109:                           logp.Info("File is inactive: %s. Closing because close_inactive of %v reached.", h.state.Source, h.config.CloseInactive)
         .          .    110:                   default:
         .          .    111:                           logp.Err("Read line error: %s; File: ", err, h.state.Source)
         .          .    112:                   }
         .          .    113:                   return
         .          .    114:           }
         .          .    115:
         .          .    116:           // Strip UTF-8 BOM if beginning of file
         .          .    117:           // As all BOMS are converted to UTF-8 it is enough to only remove this one
         .          .    118:           if h.state.Offset == 0 {
         .          .    119:                   message.Content = bytes.Trim(message.Content, "\xef\xbb\xbf")
         .          .    120:           }
         .          .    121:
         .          .    122:           // Update offset
         .          .    123:           h.state.Offset += int64(message.Bytes)
         .          .    124:
         .          .    125:           // Create state event
  546.12MB   546.12MB    126:           event := input.NewEvent(h.getState())
         .          .    127:
  371.11MB   371.11MB    128:           text := string(message.Content)
         .          .    129:
         .          .    130:           // Check if data should be added to event. Only export non empty events.
         .          .    131:           if !message.IsEmpty() && h.shouldExportLine(text) {
         .          .    132:                   event.ReadTime = message.Ts
         .          .    133:                   event.Bytes = message.Bytes
ROUTINE ======================== github.com/elastic/beats/filebeat/prospector.(*Prospector).startHarvester.func1 in /home/ademmers/gopath/src/github.com/elastic/beats/filebeat/prospector/prospector.go
         0   917.22MB (flat, cum) 94.28% of Total
         .          .    243:                   atomic.AddUint64(&p.harvesterCounter, ^uint64(0))
         .          .    244:                   p.wg.Done()
         .          .    245:           }()
         .          .    246:
         .          .    247:           // Starts harvester and picks the right type. In case type is not set, set it to defeault (log)
         .   917.22MB    248:           h.Harvest(reader)
         .          .    249:   }()
         .          .    250:
         .          .    251:   return nil
         .          .    252:}

ademmers · April 20, 2017, 8:41am

Configuration :

/etc/filebeat/filebeat.yml

filebeat.config_dir: /etc/filebeat/conf.d
filebeat.publish_async: true
filebeat.shutdown_timeout: 10s
filebeat.spool_size: 65536
logging.files:
    keepfiles: 7
    rotateeverybytes: 10485760
logging.level: info
logging.to_files: true
logging.to_syslog: false
output.kafka:
    bulk_max_size: 2048
    enabled: true
    hosts:
    - <list_of_3_kafka_nodes>
    keep_alive: 180s
    timeout: 30s
    topic: <kafka_topic>
    version: '0.10'
    worker: 8

/etc/filebeat/conf.d/service_1.yml

filebeat.prospectors:
  - paths: [ <java_based_app_log_running_on_tomcat7_unsing_log4j2> ]
    document_type: tomcatlog
    input_type: log
    fields:
      type: service
      subtype: <service_1_reference>
      application: <service_1_name>
      env: <runtime_environment>
    harvester_limit: 10
    ignore_older: 3h
    close_inactive: 2h
    clean_inactive: 4h
    close_removed: true
    clean_removed: true
    close_renamed: true
    close_timeout: 2h
    multiline:
      pattern: '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
      negate: true
      match: after

  - paths: [ <tomcat_access_log> ]
    document_type: accesslog
    input_type: log
    fields:
      type: access-log
      subtype: <service_1_reference>
      application: <service_1_name>
      env: <runtime_environment>
    harvester_limit: 10
    ignore_older: 3h
    close_inactive: 2h
    clean_inactive: 4h
    close_removed: true
    clean_removed: true
    close_renamed: true
    close_timeout: 2h

steffens · April 20, 2017, 10:42am

the problem is with filebeat.publish_async: true.

ademmers · April 20, 2017, 11:16am

ok, thanks for the quick reply. I will reconfigure several nodes and observe their behaviour over the comming days.

Still though, I can understand perfectly why you could get into situations where having async publishes to grow bigger memory usage when busy/while waiting for acks, but shouldn't golang gc clean out that extra memory when the acks are received ?
In other words: doesn't this sound like a memory leak ? (or is this a known leakage issue which I missed during searching for older issues ?)

steffens · April 20, 2017, 11:21am

well, it's known memory-leak create by by unfortunate interactions between multiple outputs, load-balancing, buffering for async, ordered ACK requirement in filebeat and potentially error handling in outputs. Batches must be ACKed in order, but are send out-of-order... this can lead to the output pipeline turning into a super big buffer never being cleaned... or eventually being cleaned (but not before OOM)...

ademmers · April 20, 2017, 11:28am

ok, got it. Will live with non-async for now.

Thanks!

system · May 18, 2017, 11:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat beta1: Insane memory footprint Beats filebeat	4	1031	October 22, 2016
Filebeat memory leak suddenly after all the files consumed Beats filebeat	5	1479	June 13, 2018
Filebeat 6.3.0 is taking too much of memory on windows 2008R2 Beats	10	1508	September 4, 2018
It seems like memory leaks Beats	1	291	September 23, 2019
Filebeat OOM error Beats	3	731	March 22, 2017

Filebeat Memory Leakage Investigation

Related topics