Performance decrease after upgrading to LS 5

We have a system in place that collects, parses, and indexes our firewall logs. We have one server that does the actual Logstash filtering before putting logs in ES. After upgrading to Logstash 5.2 from 2.4, throughput went from upwards of 6,000 events a second to nearly 2,500. I had to get a second server setup to do the same work that one server used to do with no problem. Eventually we downgraded back to 2.4. We're using the following configuration:

input {
    rabbitmq {
        host => "************"
        durable => true
        queue => "firewall-logs.ha2"
        key => "firelogs"
        exchange => "logstash.haa"
        threads => 16
        user => "************"
        password => "***********"
        prefetch_count => 1500
    }
}

filter {
    # Extract fields from the each of the detailed message types
    # The patterns provided below are included in core of LogStash 1.4.2.
    grok {
      match => [
        "message", "%{CISCOFW104001}",
        "message", "%{CISCOFW104002}",
        "message", "%{CISCOFW104003}",
        "message", "%{CISCOFW104004}",
        "message", "%{CISCOFW105003}",
        "message", "%{CISCOFW105004}",
        "message", "%{CISCOFW105005}",
        "message", "%{CISCOFW105008}",
        "message", "%{CISCOFW105009}",
        "message", "%{CISCOFW106001}",
        "message", "%{CISCOFW106006_106007_106010}",
        "message", "%{CISCOFW106014}",
        "message", "%{CISCOFW106015}",
        "message", "%{CISCOFW106021}",
        "message", "%{CISCOFW106023}",
        "message", "%{CISCOFW106100}",
        "message", "%{CISCOFW110002}",
        "message", "%{CISCOFW302010}",
        "message", "%{CISCOFW302013_302014_302015_302016}",
        "message", "%{CISCOFW302020_302021}",
        "message", "%{CISCOFW305011}",
        "message", "%{CISCOFW313001_313004_313008}",
        "message", "%{CISCOFW313005}",
        "message", "%{CISCOFW402117}",
        "message", "%{CISCOFW402119}",
        "message", "%{CISCOFW419001}",
        "message", "%{CISCOFW419002}",
        "message", "%{CISCOFW500004}",
        "message", "%{CISCOFW602303_602304}",
        "message", "%{CISCOFW710001_710002_710003_710005_710006}",
        "message", "%{CISCOFW713172}",
        "message", "%{CISCOFW733100}"
      ]
    }

    # Parse the syslog severity and facility
    syslog_pri { }

    geoip {
        source => "src_ip"
        target => "geoip"
        add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
        add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }

    mutate {
        convert => [ "[geoip][coordinates]", "float"]
    }

    mutate {
        uppercase => [ "protocol" ]
    }
}

output {
    if [type] == "admin_asa" {
        elasticsearch {
            hosts => ["************"]
            flush_size => 1500
            index => "admin-asa-%{+YYYY.MM.dd}"
        }
    } else {
        elasticsearch {
            hosts => ["************"]
            flush_size => 1500
        }
    }
}

I've played with the pipeline and output workers setting but nothing was able to get me to 2.4 level performance. Memory is not the issues. If anyone has suggestions I'd would love to hear them.

The maximum bulk size that will be sent to Elasticsearch is in version 5.x limited by the internal batch size (-b command line parameter). This defaults to 125, which results in smaller bulk requests to Elasticsearch. Try setting this to the flush size you previously used to see if it makes a difference.

I tried setting that to 1500 as well but it made little if any difference.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.