Filebeat not keeping up with logs


#1
config:
filebeat:
  prospectors:
    -
      paths:
        - /apps/logs/All.log
      input_type: log
      document_type: prod
      scan_frequency: 5s
      harvester_buffer_size: 32768
      force_close_files: true
      publish_async: true
output:
  logstash:
    hosts: ["logstash01.corp.pvt:5044", "logstash01.corp.pvt:5052"]
    worker: 16
    loadbalance: true
shipper:
logging:
  to_files: false
  files:
    path: /apps/logs/tibco/logstash
    name: mybeat
    keepfiles: 7

logstash config:

input{
  beats{
    port => 5050
  }
}

filter{
    metrics {
        meter => "events"
        add_tag => "metric"
        flush_interval => 60
    }
}

output{
      if "metric" in [tags] {
        stdout {
          codec => line {
            format => "rate: %{[events][rate_1m]}"
          }
        }
      }
   null{}
}

sample of results:
rate: 4729.901062624514
rate: 4864.084502204622
rate: 4800.204394300626

I feel this is very slow, and i have tried many modifications to find improvements in performance, to no avail. I have also attempted this via localhost to remove network from the mix, and output to /dev/null to take disk i/o out of the mix. Nothing seems to help. The following are some of my filebeat changes:

compression: 0, 1, 3, 5, 9
harvester: default, 262144, 512288
async: off, on
max_bulk_size: default, 3000, 4096, 12288
scan_frequency: 0, 1, 5, 15
workers: 8, 16 (logstash piece is on a 16 core physical)

Am i missing something major? do those scan results mean ~4500 lines per minute? or is that per second averaged over a minute? I read about people achieving 18k lines per second or more and cannot get anywhere near that. here is an example of a piece of the log:

    <ns0:Description>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;tns:validateAddAndGeo xmlns:tns="http://session.com/"&gt;
    &lt;validateAddRequest&gt;
        &lt;userId&gt;burns&lt;/userId&gt;
        &lt;crisId&gt;12345&lt;/crisId&gt;
        &lt;applicationId&gt;Aggregator&lt;/applicationId&gt;
        &lt;trackingId/&gt;
        &lt;streetAddressLine1&gt;742 Evergreen Terrace&lt;/streetAddressLine1&gt;
        &lt;streetAddressLine2/&gt;
        &lt;city&gt;Springfield&lt;/city&gt;
        &lt;state&gt;IDK&lt;/state&gt;
        &lt;zipCode&gt;
                **********
        &lt;/zipCode&gt;
    &lt;/validateAddRequest&gt;
&lt;/tns:validateAddAndGeo&gt;</ns0:Description>
</ns0:DebugLogRecord>

any help is appreciated as im kinda stuck.


(Steffen Siering) #2

the metric filter averages over time. One has to wait for quite some time until these values are more stable to get some better idea of throughput.

There are a many factors regarding filebeat performance. Just sending files to /dev/null on physical machine I was able to process like 95k eps. filebeat throughput depends on disk IO (unless files still buffered by OS caches) and downstream performance. E.g. if sending directly to elasticsearch indexing performance in elasticsearch. If sending to logstash throughput depends on processing time within logstash + performance even more downstream. This is due the outputs generating back-pressure if they can not keep up slowing down event generation in filebeat (as we don't want to drop any events).

See related post for measuring throughpt directly from filebeat: Filebeat sending data to Logstash seems too slow

To get a first idea on performance i'd run these tests:

  1. have filebeat publish to console and redirect to /dev/null. Alternatively configure file output in filebeat
  2. have filebeat->logstash, but configure logstash stdout plugin only with 'dots' codec. Use pv to measure throughput.

Advantages if 'pv' over metrics plugin is:

  • no additional filters/computations in logstash
  • more instantaneous rate

#3

I will look into pv, but lack of root access my restrict my ability to utilize that tool. As for going to null from filebeat, i did try that. I was not able to gather metrics exactly, but am able to determine it's lack of ability to keep up with demand due to holding rotated files open (using lsof) even after files are deleted.

I am looking at ~5 GB of logs created every 10 minutes, and I did not think this would be close to reaching filebeat limits, but the fact that filebeat cannot simply tail and write to null without falling behind is a bit worrying. I even commented out my multiline statement to rule that out, and it still falls behind.


(Steffen Siering) #4

This is no good indicator. Which filebeat version are you using. filebeat 1.2+ (I think) have 'closed_older' option (default 1h). filebeat will hold open files for this time after reading last line in case some more content will be added. If rate services write to log file is <10min for example you can easily set close_older to 10 minutes. Use the python script posted to determine if filebeat is still sending/processing data.


#5

running filebeat 1.2:

filebeat:
  spool_size: 8196
  prospectors:
    -
      paths:
        - /apps/tmp/My.log
      input_type: log
      document_type: tibcoprod
      scan_frequency: 0s
      force_close_files: true
output:
   console:

shipper:
logging:
  to_files: false
  files:
    path: /apps/logs/tibco/logstash
    name: mybeat
    keepfiles: 7

the command i am starting it up with:
./filebeat -e -c /apps/tmp/nullFilebeat.yml 2>/dev/null | /apps/pv/bin/pv > /dev/null

results are:
904MiB 0:02:00 [8.48MiB/s]

450 MB per minute seems VERY slow for simply tailing a file and doing nothing with it. The server this is on is a very beefy server with 64cpu and 132G ram. iostat shows nothing wrong there either.


(ruflin) #6

It is a bad idea to set scan_frequency to 0 as this will eat up all your cpu. Set it at least 1s or more.


#7

I have tried that as well. I have tried setting it to 60. It only scans the single file, but the file rotates. It should not need to check very often for new files. The issue really seems that the harvester just cannot ingest the log fast enough. If i spit it out to multiple logstash servers, i see 0 improvement from filebeat. the logstash servers just take less load instead of filebeat sending more.

Thanks for the advice, though - i will change it back to 10s for now.


(Steffen Siering) #8

what's the exact config you use to push to multiple logstash instances? Your original config won't publish to multiple instances concurrently.

filebeat is reading files + separating lines in order to create one event/document per line. If file is in cache, reading will be much faster. For comparison can you use '-httpprof' and python script from linked threat and run filebeat like:

./filebeat -e -c /apps/tmp/nullFilebeat.yml -httpprof :6060 >/dev/null 2>&1 the python script will print the event rate which matches lines published per second.

Try:

filebeat:
  spool_size: 8196
  publish_async: true
  prospectors:
    -
      paths:
        - /apps/tmp/My.log
      input_type: log
      document_type: tibcoprod
      scan_frequency: 1s

output:
   console:

shipper:

logging:
  to_files: false
  files:
    path: /apps/logs/tibco/logstash
    name: mybeat
    keepfiles: 7

(system) #9