Logstash is periodically hanging (unresponsive). How to recognize and fix?

PaulDonskikh · May 31, 2019, 9:37pm

Hi!
I'm working on an ELK-stack installation.
And there's a logstash instance which for some unknown yet reason periodically (once in 2-10 days) stops inserting data into ES.

And the most confusing part is that the logstash does not indicate there's something wrong with it.
I mean the logstash process does not crash.
The logs do not show any errors. Even more, it keeps adding some logs as if everything is perfect (jdbc_static plugin adds logs every hour).

The only way I may know that the logstash stopped inserting data into ES is by looking into kibana and see no new data.

My current thoughts about this issue are:
I have a lot of devices sending data into logstash (with the lumberjack input plugin). Maybe once in a while my pipeline gets overloaded with the input data (or because of a lot of processing before the output). Logstash is a multi-thread app, so maybe some threads, responsible for either input or output, get blocked.
But the app itself continues working.

And eventually the only way to "unblock" those threads (and revitalize the logstash) is to re-launch the logstash. And wait another 2-10 days when it stops inserting data.

But this is a baaaaad way of solving the issue.

Has anyone had that kind of issues before?
How do I make the logstash to unblock its own threads or restart them?
Or how do I make the logstash to restart itself when its threads are blocked?
Maybe there's a known way to approach that problem.

Info about my installation:

ELK stack version 6.6.0.
openjdk version "1.8.0_201"
OS Amazon Linux AMI 2018.03
pipeline.workers: 8
LS_JAVA_OPTS=" -Xmx2g -Xms2g"
Logstash is accepting data with the lumberjack input plugin (with congestion_threshold => 15).
I have a lot of stuff happening inside the filter section of the config. There's grok, mutate, date, ruby, jdbc_streaming and jdbc_static plugins.
I also have many devices sending data. Each device can send up to 1000 events per time, every 30 seconds.

Thank you for any help!

stcdarrell · June 2, 2019, 1:14pm

i have had many of the same issues you are reporting.
Things that could help:
-If you are using a memory queue, try a persistant queue.

think about implementing a message queue (redis, rabbit mq, or kafka)
work flow would be something like:
logstash --> Message Queue <--> another logstash server (that does processing/enriching)--> elasticsearch
install metric beat on your logstash server to monitor usage and workflow

PaulDonskikh · June 3, 2019, 3:09pm

Darrel, thank you for your response!
I've been considering using redis. Eventually I might end up using it but at the moment I'm trying to avoid any additional entities in our setup.

So you've had the same issues and after some amount of time spent on finding the solution you decided to use something preventing the logstash to get hanged.
And I can make an unfortunate conclusion that you have not found the way to solve it on the logstash level. Which would be to unblock the logstash thread (or whatever it is, it's just my assumption) :(.

Following your advice I'm looking at how I can use Metricbeat...

PaulDonskikh · June 14, 2019, 9:56pm

Found a related topic with no found solutions

system · July 12, 2019, 9:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Hangs (thread blocked) Logstash	2	1535	July 26, 2018
Logstash is periodically hanging (unresponsive) and not sending anything to ES. How to recognize and fix? Logstash	1	303	January 13, 2021
Debugging hanging logstash Logstash	1	1311	February 21, 2017
Logstash hangs after few days and stops processing logfiles Logstash	6	2781	July 6, 2017
Logstash stop reading from RabbitMQ Logstash	5	1593	July 6, 2017

Logstash is periodically hanging (unresponsive). How to recognize and fix?

Related topics