Unacceptable Logstash startup times

rcowart · February 24, 2019, 8:59am

There has been a disturbing trend with Logstash performance and stability. Including stability issues (6.1.3 is the last version that can run a long time on MacOS without crashing), the 6.6.x containers crash within minutes with most of our pipelines (although some of those same pipelines run fine natively installed on Ubuntu 18.04), and startup time have grown to the point of being completely unacceptable. Here are three different examples of our collection, processing and indexing instances:

Collection Pipelines

6.1.3 - 0:08

[2019-02-24T08:06:59,455][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.1.3"}
[2019-02-24T08:07:07,108][INFO ][logstash.agent           ] Pipelines running {:count=>5, :pipelines=>["collect-beats", "collect-ipfix", "collect-netflow", "collect-sflow", "collect-syslog"]}

6.6.1 - 0:26

[2019-02-24T07:54:56,510][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.6.1"}
[2019-02-24T07:55:22,494][INFO ][logstash.agent           ] Pipelines running {:count=>5, :running_pipelines=>[:"collect-syslog", :"collect-netflow", :"collect-ipfix", :"collect-beats", :"collect-sflow"], :non_running_pipelines=>[]}

Processing Pipelines

6.1.3 - 4:25

[2019-02-24T08:15:17,540][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.1.3"}
[2019-02-24T08:19:42,369][INFO ][logstash.agent           ] Pipelines running {:count=>30, :pipelines=>["process-filebeat-suricata_eve", "process-ipfix", "process-netflow", "process-sflow", "process-shared-conn", "process-syslog", "process-syslog-access", "process-syslog-apache_httpd", "process-syslog-application", "process-syslog-barracuda_cgfw", "process-syslog-blackridge_tac", "process-syslog-checkpoint", "process-syslog-cisco_ios", "process-syslog-citrix_netscaler_appfw", "process-syslog-clamav", "process-syslog-dns_logger", "process-syslog-dnsmasq", "process-syslog-forcepoint_ngfw", "process-syslog-fortinet_fortios", "process-syslog-iptables", "process-syslog-juniper_junos", "process-syslog-lastline_enterprise", "process-syslog-network", "process-syslog-nginx", "process-syslog-palo_alto", "process-syslog-squid", "process-syslog-system", "process-syslog-tripwire", "process-syslog-ulogd", "process-winlogbeat"]}

6.6.1 - 44:35 (!!!)

[2019-02-24T07:12:37,895][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.6.1"}
[2019-02-24T07:57:12,720][INFO ][logstash.agent           ] Pipelines running {:count=>30, :running_pipelines=>[:"process-sflow", :"process-syslog-nginx", :"process-syslog-juniper_junos", :"process-syslog-forcepoint_ngfw", :"process-ipfix", :"process-syslog-iptables", :"process-netflow", :"process-syslog-fortinet_fortios", :"process-syslog-blackridge_tac", :"process-syslog-system", :"process-syslog", :"process-syslog-tripwire", :"process-syslog-ulogd", :"process-syslog-cisco_ios", :"process-syslog-lastline_enterprise", :"process-winlogbeat", :"process-syslog-barracuda_cgfw", :"process-syslog-checkpoint", :"process-filebeat-suricata_eve", :"process-syslog-palo_alto", :"process-syslog-apache_httpd", :"process-shared-conn", :"process-syslog-dns_logger", :"process-syslog-access", :"process-syslog-citrix_netscaler_appfw", :"process-syslog-application", :"process-syslog-dnsmasq", :"process-syslog-network", :"process-syslog-clamav", :"process-syslog-squid"], :non_running_pipelines=>[]}

Indexing Pipelines

6.1.3 - 0:06

[2019-02-24T08:06:26,226][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.1.3"}
[2019-02-24T08:06:31,829][INFO ][logstash.agent           ] Pipelines running {:count=>30, :pipelines=>["index-filebeat-suricata_eve", "index-ipfix", "index-netflow", "index-sflow", "index-syslog-access", "index-syslog-apache_httpd", "index-syslog-application", "index-syslog-barracuda_cgfw", "index-syslog-blackridge_tac", "index-syslog-cef", "index-syslog-checkpoint", "index-syslog-cisco_ios", "index-syslog-citrix_netscaler_appfw", "index-syslog-clamav", "index-syslog-dns_logger", "index-syslog-dnsmasq", "index-syslog-forcepoint_ngfw", "index-syslog-fortinet_fortios", "index-syslog-generic", "index-syslog-iptables", "index-syslog-juniper_junos", "index-syslog-lastline_enterprise", "index-syslog-network", "index-syslog-nginx", "index-syslog-palo_alto", "index-syslog-squid", "index-syslog-system", "index-syslog-tripwire", "index-syslog-ulogd", "index-winlogbeat”]}

6.6.1 - 2:33

[2019-02-24T07:40:17,555][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.6.1"}
[2019-02-24T07:42:50,408][INFO ][logstash.agent           ] Pipelines running {:count=>30, :running_pipelines=>[:"index-syslog-cisco_ios", :"index-ipfix", :"index-syslog-ulogd", :"index-syslog-lastline_enterprise", :"index-syslog-network", :"index-syslog-forcepoint_ngfw", :"index-syslog-clamav", :"index-syslog-citrix_netscaler_appfw", :"index-syslog-squid", :"index-sflow", :"index-syslog-checkpoint", :"index-syslog-application", :"index-syslog-system", :"index-syslog-tripwire", :"index-syslog-barracuda_cgfw", :"index-syslog-dnsmasq", :"index-syslog-palo_alto", :"index-syslog-nginx", :"index-syslog-iptables", :"index-syslog-juniper_junos", :"index-syslog-cef", :"index-filebeat-suricata_eve", :"index-netflow", :"index-winlogbeat", :"index-syslog-blackridge_tac", :"index-syslog-dns_logger", :"index-syslog-generic", :"index-syslog-apache_httpd", :"index-syslog-access", :"index-syslog-fortinet_fortios"], :non_running_pipelines=>[]}

For now we have been forced to standardize on Logstash 6.1.3 (with updated plugins). It would be good to eventually be able to upgrade, but it seems that no real testing is being done beyond very very simple configurations.

Is there any effort going into reversing this trend?

warkolm · February 24, 2019, 9:06am

Have you checked out the jruby startup problem stuff?

rcowart · February 24, 2019, 9:31am

Mark, hacking stuff behind the scenes cannot be the answer, especially when some of those changes bring negative side-effects like "at the expense of long-running code speed".

There has been a consistent degradation of Logstash performance and stability since 6.1.3 that points to insufficient testing with real-world usage patterns. While we would like to stick with Logstash, if this trend doesn't change, we will be forced to move on to something else.

warkolm · February 24, 2019, 8:43pm

It's a pretty common issue with jruby that we suggest everyone look at as a solution to similar questions. Waving it off as "hacking stuff behind the scenes" certainly won't fix anything.

It's also very hard for us to test real-world, because what does that even mean? We do run a suite of tests including standardised apache logs, but is that what you consider real world? If not what is, can you provide config examples of the ones above that consistently show degradation, have you raised GitHub issues around the performance drops you are seeing?

We'd be more than happy to look into things further, but insinuating crappy coding practises on behalf of the team, and then not providing any concrete information for us to investigate with, it just doesn't help anyone unfortunately.

rcowart · February 24, 2019, 9:25pm

I created the thread here because on GitHub it specifically says "Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here." So no I haven't raised GitHub issues... yet. However I now have clarity on what I need to do next.

warkolm · February 24, 2019, 9:26pm

If you've got specific configs with timings, then raising those as an issue would be great

system · March 24, 2019, 9:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash 6.2.4+ significantly slower than 6.2.3 Logstash	2	642	September 10, 2018
6.1.2 performs existing ruby filter extremly SLOWER than 5.2.0. Need help! Logstash	9	1449	April 5, 2018
LogStash Stops after a few hours Logstash	18	6329	July 6, 2017
Fresh Logstash installation crashes periodically, has high CPU load, returns "active:failed" and "status=143" after stopping Logstash	4	3791	March 25, 2017
Logstash pipelines taking longer duration to start Logstash elastic-stack-monitoring	3	468	January 13, 2021

Unacceptable Logstash startup times

Collection Pipelines

Processing Pipelines

Indexing Pipelines

Related topics