Filebeat sending data to Logstash seems too slow

steffens · April 6, 2016, 1:58pm

why not set bulk_max_size=2048? I found 4k and 2k much better for throughput in comparison to. with this many workers on physical machine with 16 cores I measured like 45k lines/s.

some ways to measure performance and look for bottlenecks:

have filebeat publish to console and redirect to /dev/null. Alternatively configure file output in filebeat
have filebeat->logstash, but configure logstash stdout plugin only with 'dots' codec. Use pv to measure throughput.
run filebeat->logstash->elasticsearch and measure throughput

If throughput in 2 is bigger than in 3, there is a problem with indexing performance in elasticsearch (increase workers in elasticsearch output?). If throughput in 1 is bigger than in 2 problem is either with network or logstash performance (any complicated filters taking long?).

Have you check resource usage on logstash machine?

You can measure throughput in filebeat by enabling profiling support in filebeat with -httpprof :6060 or -httpprof localhost:6060 (use later if you want to have HTTP port being available from localhost only) and use this python script to collect/measure throughput:

expvar_rates.py

import requests
import argparse
import time
import curses


def main():
    parser = argparse.ArgumentParser(
        description="Print per second stats from expvars")
    parser.add_argument("url",
                        help="The URL from where to read the values")
    args = parser.parse_args()

    stdscr = curses.initscr()
    curses.noecho()
    curses.cbreak()

    last_vals = {}

    # running average for last 30 measurements
    N = 30
    avg_vals = {}
    now = time.time()

    while True:
        try:
            time.sleep(1.0)
            stdscr.erase()

            r = requests.get(args.url)
            json = r.json()

            last = now
            now = time.time()
            dt = now - last

            for key, total in json.items():
                if isinstance(total, (int, long, float)):
                    if key in last_vals:
                        per_sec = (total - last_vals[key])/dt
                        if key not in avg_vals:
                            avg_vals[key] = []
                        avg_vals[key].append(per_sec)
                        if len(avg_vals[key]) > N:
                            avg_vals[key] = avg_vals[key][1:]
                        avg_sec = sum(avg_vals[key])/len(avg_vals[key])
                    else:
                        per_sec = "na"
                        avg_sec = "na"
                    last_vals[key] = total
                    stdscr.addstr("{}: {}/s (avg: {}/s) (total: {})\n"
                                  .format(key, per_sec, avg_sec, total))
            stdscr.refresh()
        except requests.ConnectionError:
            stdscr.addstr("Waiting for connection...\n")
            stdscr.refresh()
            last_vals = {}
            avg_vals = {}

if __name__ == "__main__":
    main()

run script via python expvar_rates.py http://localhost:6060 to get some output.

Topic		Replies	Views
FileBeat slow - Improve performance Beats filebeat	12	20205	September 20, 2018
Speed limitations of filebeat? Beats filebeat	14	15266	July 5, 2017
Filebeat cannot keep up Beats filebeat	7	1095	July 13, 2018
Logstash Peformance Logstash	35	3286	May 19, 2017
Filebeat slow to send logs to logstash Beats filebeat	3	1752	January 10, 2020

Filebeat sending data to Logstash seems too slow

Related topics