Increasing throughput from Filebeat to Logstash

nathansegers · September 13, 2019, 2:44pm

I am using latest versions of Filebeat, Logstash and Elasticsearch on Ubuntu 18.04 machines

I have:
2 filebeat VM's, configured with 8 CPU cores and 16 GB Memory
3 logstash VM's with 24 CPU cores and 64 Gb Memory (31GB Heap)
3 Elasticsearch VM's with 16 CPU cores and 64 Gb Memory (31GB Heap)

filebeat.yml (same for both machines, they share a physical SSD on '/mnt/data', but each with their own allocated space and partitions)

filebeat.inputs:
        - type: log
          enabled: true
          paths:
              - /mnt/data/*.csv # This directory contains 502 CSV's, with a total of 780 million (780.000.000) lines and 4 columns (field_0;field_1;field_2;field_3) all of which are Integers. The directory never changes. it's historical data
          tail_files: false

queue.mem:
        events: 262144
        flush.min_events: 32768
        flush.timeout: 5s
output.logstash:
        hosts:
            - "ls-01-nathan"
            - "ls-02-nathan"
            - "ls-03-nathan"
        bulk_max_size: 32768
        loadbalance: true
        pipelining: 8
        worker: 40
http.enabled: true
monitoring.elasticsearch:
        hosts: ["es-01-nathan", "es-02-nathan", "es-03-nathan"]

logstash.yml
(I removed all the comments and #-lines)

pipeline.id: ls-01-pipeline
pipeline.workers: 48
pipeline.batch.size: 131072
pipeline.batch.delay: 50
queue.type: memory
log.level: info
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.hosts: ["http://es-01-nathan:9200", "http://es-02-nathan:9200", "http://es-03-nathan:9200"]
xpack.monitoring.elasticsearch.sniffing: true

logstash-config.conf

input {
        beats {
                port => 5044
        }
}

filter {
        csv {
                columns => ["field_0", "field_1", "field_2", "field_3"]
                separator => ";"
        }
        mutate {
                remove_field => [ "field_0", "message", "host", "@timestamp", "@version"]

                split => { "[log][file][path]" => "/" }
                split => { "[log][file][path][-1]" => "_" }
                copy => { "[log][file][path][-1][0]" => "timestamp" }

                convert => {
                        "field_1" => "integer"
                        "field_2" => "integer"
                        "field_3" => "integer"
                }
    }
    date {
            match => [ "timestamp", "yyyyMMddHHmm" ]
            target => "timestamp"
    }
    mutate {
            remove_field => [
            "log",
            "agent",
            "tags",
            "ecs",
            "input"]
    }
}


output {
        elasticsearch {
                hosts => ["es-01-nathan:9200", "es-02-nathan-4u:9200", "es-03-nathan-4u:9200"]
                index => "index"
        }
}

When I change the filebeat output to output.console and check the throughput (pv -Warl) I get around 85k/s. When I am sending the same output to Logstash, with the loadbalancing enabled. I get around 40k/s.

I have tried increasing the workers and bulk_size, but 40k/s is the max I can get. I need to get it higher.

system · November 1, 2019, 1:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Insufficient throughput from Filebeat Beats filebeat	18	10504	July 5, 2017
Filebeat and Logstash tuning - How to align performance Beats filebeat	2	2011	April 22, 2022
While shipping the server logs through filebeat to elasticsearch via logstash , most of the entries are missing Beats filebeat	3	585	September 23, 2019
Filebeat performance tuning Beats filebeat	4	5302	January 20, 2017
How to increase filebeat speed Beats filebeat	4	1842	October 11, 2019

Increasing throughput from Filebeat to Logstash

Related topics