Insufficient throughput from Filebeat

steffens · January 20, 2016, 10:10am

Hi,

Are you using network shares or are these files all local? Have you checked all files being actually forwarded? Can you share your filebeat config file?

I'd like to run some tests first. Unfortunately you will need 1.1 for these tests, as we started to insert some counters in critical places.

Here is a python script to collect these counters and compute rates:

import requests
import argparse
import time
import curses


def main():
    parser = argparse.ArgumentParser(
        description="Print per second stats from expvars")
    parser.add_argument("url",
                        help="The URL from where to read the values")
    args = parser.parse_args()

    stdscr = curses.initscr()
    curses.noecho()
    curses.cbreak()

    last_vals = {}

    # running average for last 30 measurements
    N = 30
    avg_vals = {}
    now = time.time()

    while True:
        try:
            time.sleep(1.0)
            stdscr.erase()

            r = requests.get(args.url)
            json = r.json()

            last = now
            now = time.time()
            dt = now - last

            for key, total in json.items():
                if isinstance(total, (int, long, float)):
                    if key in last_vals:
                        per_sec = (total - last_vals[key])/dt
                        if key not in avg_vals:
                            avg_vals[key] = []
                        avg_vals[key].append(per_sec)
                        if len(avg_vals[key]) > N:
                            avg_vals[key] = avg_vals[key][1:]
                        avg_sec = sum(avg_vals[key])/len(avg_vals[key])
                    else:
                        per_sec = "na"
                        avg_sec = "na"
                    last_vals[key] = total
                    stdscr.addstr("{}: {}/s (avg: {}/s) (total: {})\n"
                                  .format(key, per_sec, avg_sec, total))
            stdscr.refresh()
        except requests.ConnectionError:
            stdscr.addstr("Waiting for connection...\n")
            stdscr.refresh()
            last_vals = {}
            avg_vals = {}

if __name__ == "__main__":
    main()

The counters are queried via http. I assume you don't want to have an open port, so we make the beat to bind the port to localhost only (the python script must be run locally in this case).

run.sh:

#!/bin/sh
REGISTRY=.filebeat
rm -f $REGISTRY
filebeat -e -httpprof 127.0.0.1:6060 -c $1

The -httpprof 127.0.0.1:6060 creates a small http server for exposing some counters. The script awaits the configuration file to run filebeat with.

Start the python script like this (can be active all the time):

$ python expvar_rates.py http://localhost:6060/debug/vars

There are quite some variables in here. I'd like to reduce them first to get an idea of I/O. To do so, get your config file and comment out the logstash output plugin. Instead enable the console output plugin. Plus, reconfigure the registry path (and adapt the shell script), so you don't overwrite your actual registry.

When having console output it's a good idea to call run.sh like this:

$ ./run.sh test.yml > /dev/null

Consider doing the test (without sending to /dev/null) with only one prospector configured at a time, to eliminate the case log files are missing due to faulty config.

Doing this test with Nasa HTTP logs I get about 75k lines/s with default configuration.

Topic		Replies	Views
FileBeat slow - Improve performance Beats filebeat	12	20194	September 20, 2018
Speed limitations of filebeat? Beats filebeat	14	15262	July 5, 2017
Filebeat sending data to Logstash seems too slow Beats filebeat	20	22456	June 1, 2017
Filebeat - logstash performances troubleshooting Beats filebeat	15	4785	April 17, 2017
Filebeat performance stall sometimes Beats filebeat	17	2541	May 1, 2020

Insufficient throughput from Filebeat

Related topics