Increasing throughput from Filebeat to Logstash

I am using latest versions of Filebeat, Logstash and Elasticsearch on Ubuntu 18.04 machines

I have:
2 filebeat VM's, configured with 8 CPU cores and 16 GB Memory
3 logstash VM's with 24 CPU cores and 64 Gb Memory (31GB Heap)
3 Elasticsearch VM's with 16 CPU cores and 64 Gb Memory (31GB Heap)

filebeat.yml (same for both machines, they share a physical SSD on '/mnt/data', but each with their own allocated space and partitions)

filebeat.inputs:
        - type: log
          enabled: true
          paths:
              - /mnt/data/*.csv # This directory contains 502 CSV's, with a total of 780 million (780.000.000) lines and 4 columns (field_0;field_1;field_2;field_3) all of which are Integers. The directory never changes. it's historical data
          tail_files: false

queue.mem:
        events: 262144
        flush.min_events: 32768
        flush.timeout: 5s
output.logstash:
        hosts:
            - "ls-01-nathan"
            - "ls-02-nathan"
            - "ls-03-nathan"
        bulk_max_size: 32768
        loadbalance: true
        pipelining: 8
        worker: 40
http.enabled: true
monitoring.elasticsearch:
        hosts: ["es-01-nathan", "es-02-nathan", "es-03-nathan"]

logstash.yml
(I removed all the comments and #-lines)

pipeline.id: ls-01-pipeline
pipeline.workers: 48
pipeline.batch.size: 131072
pipeline.batch.delay: 50
queue.type: memory
log.level: info
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.hosts: ["http://es-01-nathan:9200", "http://es-02-nathan:9200", "http://es-03-nathan:9200"]
xpack.monitoring.elasticsearch.sniffing: true

logstash-config.conf

input {
        beats {
                port => 5044
        }
}

filter {
        csv {
                columns => ["field_0", "field_1", "field_2", "field_3"]
                separator => ";"
        }
        mutate {
                remove_field => [ "field_0", "message", "host", "@timestamp", "@version"]

                split => { "[log][file][path]" => "/" }
                split => { "[log][file][path][-1]" => "_" }
                copy => { "[log][file][path][-1][0]" => "timestamp" }

                convert => {
                        "field_1" => "integer"
                        "field_2" => "integer"
                        "field_3" => "integer"
                }
    }
    date {
            match => [ "timestamp", "yyyyMMddHHmm" ]
            target => "timestamp"
    }
    mutate {
            remove_field => [
            "log",
            "agent",
            "tags",
            "ecs",
            "input"]
    }
}


output {
        elasticsearch {
                hosts => ["es-01-nathan:9200", "es-02-nathan-4u:9200", "es-03-nathan-4u:9200"]
                index => "index"
        }
}

When I change the filebeat output to output.console and check the throughput (pv -Warl) I get around 85k/s. When I am sending the same output to Logstash, with the loadbalancing enabled. I get around 40k/s.

I have tried increasing the workers and bulk_size, but 40k/s is the max I can get. I need to get it higher.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.