Logstash Hangs (thread blocked)

Hi Everyone,

We're running a Logstash pipeline with TCP input > JSON filter > ES output, and most of the time it goes great, except sometimes logstash will stop inserting data into ES without any log that we could find, and without crashing. It just hangs and stops processing data.

The frequency of such 'freezes' is variable, roughly every one or two days. We send data to the TCP input (with SSL) from about 10 devices, at around 3 logs/second.

The monitoring API shows that a worker thread sometimes is in a blocked state, but it doesn't stay blocked.

$ curl -XGET 'localhost:9600/_node/hot_threads?pretty'
{
  "host" : "ns3038376",
  "version" : "6.3.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "8682ff3b-7c23-4200-81da-8c8a0344cb0e",
  "name" : "ns3038376",
  "hot_threads" : {
    "time" : "2018-06-27T17:14:21+02:00",
    "busiest_threads" : 3,
    "threads" : [ {
      "name" : "[main]<tcp",
      "thread_id" : 37,
      "percent_of_cpu_time" : 12.33,
      "state" : "runnable",
      "traces" : [ "sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)", "sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)", "sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)" ]
    }, {
      "name" : "LogStash::Runner",
      "thread_id" : 1,
      "percent_of_cpu_time" : 0.2,
      "state" : "timed_waiting",
      "traces" : [ "java.lang.Object.wait(Native Method)", "java.lang.Thread.join(Thread.java:1260)", "org.jruby.internal.runtime.NativeThread.join(NativeThread.java:76)" ]
    }, {
      "name" : "Ruby-0-Thread-11@[main]>worker6",
      "thread_id" : 35,
      "percent_of_cpu_time" : 0.17,
      "state" : "blocked",
      "path" : ":1",
      "traces" : [ "org.logstash.ackedqueue.Queue.readBatch(Queue.java:523)", "org.logstash.ackedqueue.ext.JRubyAckedQueueExt.readBatch(JRubyAckedQueueExt.java:121)", "org.logstash.ackedqueue.AckedReadBatch.<init>(AckedReadBatch.java:34)" ]
    } ]
  }
}

We can see in our server's connections that there are a lot of 'CLOSED_WAIT' connections, but if I were to take a guess, I'd say that Logstash hanging and unable to close them is the problem, thus it would be a consequence rather than a cause.

tcp6       0      0 [::]:5959 SOME_CLIENT_IP:*                  LISTEN      logstash   5572793     8602/java       
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:50368    CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:50367    CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:56229 CLOSE_WAIT  root       0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:46117 SYN_RECV    logstash   0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:40401 CLOSE_WAIT  root       0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:47583 ESTABLISHED logstash   5925226     8602/java       
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:38020     SYN_RECV    logstash   0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:45042 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:50365    CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:58625 CLOSE_WAIT  root       0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:42624 SYN_RECV    logstash   0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:42621 SYN_RECV    logstash   0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:40400 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:43531 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:54918 CLOSE_WAIT  root       0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:60724 SYN_RECV    logstash   0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:47556 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:58623 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:45044 CLOSE_WAIT  root       0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:57615 SYN_RECV    logstash   0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:54921 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:56228 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:43532 CLOSE_WAIT  root       0           -               
tcp6       0      0 SERVER_IP:5959 SOME_CLIENT_IP:59742 SYN_RECV    logstash   0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:46493 CLOSE_WAIT  root       0           -               
tcp6     518      0 SERVER_IP:5959 SOME_CLIENT_IP:46491 CLOSE_WAIT  root       0           -               
...

Do you have any idea what we can try to change or check, in order to identify where the problem comes from?

Thanks.

Here is our configuration: (I couldn't put all of the three samples in the same message)

input {
    tcp {
        port => 5959
        ssl_enable => true
        ssl_verify => false
        ssl_cert => "${LOGSTASH_TLS_CERT}"
        ssl_key => "${LOGSTASH_TLS_KEY}"
    }
}

filter {
    json {
        source => "message"
        target => "message"
    }
    json {
        source => "[message][message]"
        target => "[message][message]"
    }
    mutate {
        remove_field => [ "[message][extra][func_name]", "[message][extra][line]", "[message][extra][path]" ]
    }
    if !([message][message][client_uid]) {
        mutate {
            add_field => {"[message][message][client_uid]" => 0}
        }
    }
    if !([message][message][uid_mission_order]) {
        mutate {
            add_field => {"[message][message][uid_mission_order]" => 0}
        }
    }
}

output {
    if "_jsonparsefailure" in [tags] {
        file {
            path => "/var/log/logstash/json_parse_failure.json"
        }
    } else {
        elasticsearch {
            index => "%{[message][extra][logger_name]}_%{[message][message][client_uid]}_%{[message][message][uid_robot]}_%{[message][message][uid_mission_order]}"
            document_id => "%{[message][extra][logger_name]}_%{[message][message][client_uid]}_%{[message][message][uid_robot]}_%{[message][message][uid_mission_order]}_%{[message][message][time]}"
        }
    }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.