Logstash is periodically hanging (unresponsive). How to recognize and fix?

I'm working on an ELK-stack installation.
And there's a logstash instance which for some unknown yet reason periodically (once in 2-10 days) stops inserting data into ES.

And the most confusing part is that the logstash does not indicate there's something wrong with it.
I mean the logstash process does not crash.
The logs do not show any errors. Even more, it keeps adding some logs as if everything is perfect (jdbc_static plugin adds logs every hour).

The only way I may know that the logstash stopped inserting data into ES is by looking into kibana and see no new data.

My current thoughts about this issue are:
I have a lot of devices sending data into logstash (with the lumberjack input plugin). Maybe once in a while my pipeline gets overloaded with the input data (or because of a lot of processing before the output). Logstash is a multi-thread app, so maybe some threads, responsible for either input or output, get blocked.
But the app itself continues working.

And eventually the only way to "unblock" those threads (and revitalize the logstash) is to re-launch the logstash. And wait another 2-10 days when it stops inserting data.

But this is a baaaaad way of solving the issue.

Has anyone had that kind of issues before?
How do I make the logstash to unblock its own threads or restart them?
Or how do I make the logstash to restart itself when its threads are blocked?
Maybe there's a known way to approach that problem.

Info about my installation:

  1. ELK stack version 6.6.0.
  2. openjdk version "1.8.0_201"
  3. OS Amazon Linux AMI 2018.03
  4. pipeline.workers: 8
  5. LS_JAVA_OPTS=" -Xmx2g -Xms2g"
  6. Logstash is accepting data with the lumberjack input plugin (with congestion_threshold => 15).
  7. I have a lot of stuff happening inside the filter section of the config. There's grok, mutate, date, ruby, jdbc_streaming and jdbc_static plugins.
  8. I also have many devices sending data. Each device can send up to 1000 events per time, every 30 seconds.

Thank you for any help!

i have had many of the same issues you are reporting.
Things that could help:
-If you are using a memory queue, try a persistant queue.

  • think about implementing a message queue (redis, rabbit mq, or kafka)
    work flow would be something like:
    logstash --> Message Queue <--> another logstash server (that does processing/enriching)--> elasticsearch
  • install metric beat on your logstash server to monitor usage and workflow

Darrel, thank you for your response!
I've been considering using redis. Eventually I might end up using it but at the moment I'm trying to avoid any additional entities in our setup.

So you've had the same issues and after some amount of time spent on finding the solution you decided to use something preventing the logstash to get hanged.
And I can make an unfortunate conclusion that you have not found the way to solve it on the logstash level. Which would be to unblock the logstash thread (or whatever it is, it's just my assumption) :(.

Following your advice I'm looking at how I can use Metricbeat...

Found a related topic with no found solutions

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.