Logstash is overloaded?

sfchrisgleason · June 8, 2018, 3:37pm

Hey all,

I've got an ELK stack for development and all was well, up until recently. Not quite sure what's going on, but I'm pretty sure it's logstash getting creamed and not able to keep up.

The basics are I have an 8 node Elasticsearch 6.2.4 cluster, a 2 node Logstash cluster and a single Kibana host.

When I let Logstash run I see no entries in Kibana for the Filebeat indexes. All the syslog ones are fine.

If I bounce Logstash, the Filebeat indexes start showing data in the histogram, then peter out after a few minutes and I start seeing this in all the filbeat logs:

|2018-06-08T08:20:14.792-0700|ERROR|logstash/async.go:235|Failed to publish events caused by: read tcp 172.x.x.251:56820->172.x.x.246:5044: i/o timeout|
|---|---|---|---|
|2018-06-08T08:20:14.843-0700|ERROR|logstash/async.go:235|Failed to publish events caused by: client is not connected|
|2018-06-08T08:20:14.887-0700|ERROR|logstash/async.go:235|Failed to publish events caused by: client is not connected|
|2018-06-08T08:20:14.953-0700|ERROR|logstash/async.go:235|Failed to publish events caused by: client is not connected|
|2018-06-08T08:20:15.844-0700|ERROR|pipeline/output.go:92|Failed to publish events: client is not connected|
|2018-06-08T08:20:15.887-0700|ERROR|pipeline/output.go:92|Failed to publish events: client is not connected|
|2018-06-08T08:20:15.953-0700|ERROR|pipeline/output.go:92|Failed to publish events: client is not connected|

I am able to telnet from the same host to that port, so it's not connectivity. I've also played with Filebeat and tried to change max_bulk_size up and down to no avail.

When I tail the logstash logs I see the events coming in in the logs

I guess my question is how do I tune logstash? I increased Java heap and set pipeline settings in logstash but I'm seeing no difference? I'm monitoring the in/out events and heap usage but I don't know what's healthy or not.

Each logstash server shows the in and out events just going up and up and up. Starting around 1 million and after a while getting upwards of 10 million.

My heap usage fluctuates anywhere between 25-80% going up and down.

I'm not entirely sure what I need to do to relieve the pressure, other than scale the LS cluster horizontally, but if I'd rather understand how to tune and troubleshoot the service properly.

Let me know where you think I should start!

Thanks!

sfchrisgleason · June 8, 2018, 4:11pm

Soooo, right after I posted this I changed the Filebeat bulk_max_size to 4096 just for giggles, and I can see the data coming in without stopping. I guess this could have been a filebeat thing then? That said, I'll keep an eye on it and see if it peters out again.

Thanks!

andrewkroh · June 8, 2018, 5:03pm

What versions are using for Filebeat and Logstash?

I'd make sure you have the latest version of the beats input plugin installed in Logstash. (update instructions) (changelog)

sfchrisgleason · June 11, 2018, 10:40am

Thanks Andrew. I'll check that out.

It does appear that data is still coming in so that change appears to have resolved the issue.

IIRC I can only use the plugin frame work if I've compiled from source? I use docker/repo packages.

If I can great, I'll definitely update. I'll check it out either way.

sfchrisgleason · June 11, 2018, 12:24pm

Also I have a quick question about The API stats. Is the events in/out from pipeline stats cumulative?

"pipeline": {
    "events": {
        "duration_in_millis": 93726814,
        "filtered": 26448856,
        "in": 26448856,
        "out": 26448856,
        "queue_push_duration_in_millis": 9758765

If so I should probably set my monitor to use this as a delta as I'm not getting a good idea of in/out per minute.

logstash%20events

andrewkroh · June 11, 2018, 12:47pm

You can use LS plugins with any release. Internally all of the standard inputs/filters/outputs are plugins themselves. These plugins are just bundled in by default with LS.

I believe those are cumulative. So doing a derivative will be needed to get rates. I'm pretty sure the x-pack monitoring UI for Logstash uses a derivative aggregation to display these values.

BTW The monitoring feature is included in the x-pack basic license which is free. And it includes some of these metrics plus the pipeline viewer.

system · July 9, 2018, 12:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error publishing events (EOF, broken pipe, i/o timeout, connection reset) Beats filebeat	7	3932	December 26, 2016
File beat fails to send events to logstash Beats	17	5347	July 5, 2017
Logstash with Filebeat i/o timeout Logstash	2	2643	April 18, 2018
ERR Failed to publish events caused by: read tcp (filebeat Version: 5.5.1) Beats filebeat	5	4934	November 13, 2017
Io timeout using filebeat and logstash Logstash	1	1177	October 31, 2017

Logstash is overloaded?

Related topics