High pressure pipeline causes orphaned TCP/SSL Connections

It appears that Logstash is not adhering to the TLS specification of allowing half-closed connections and is instead orphaning TCP sockets when it cannot send its TCP FIN packet to the client. This can happen when there is sufficient back pressure on the pipeline (such as when there is an Elasticsearch outage or resource exhuastion).

You can find the relevant specification at: SSL Shutdown
You can find the relevant TLS1.2 specification about half-open closing at: RFC5246

Relevant Logstash Configuration:

beats {
    port  => 22223
    tags  => [ "data-probe" ]
  
    # encryption settings
    ssl                                   => true
    ssl_certificate             => "PATH_TO_SSL_CERT"
    ssl_key                          => "PATH_TO_PUBLIC_KEY"
    ssl_certificate_authorities => [ "PATH_TO_ROOT_CERTIFICATE" ]
    ssl_verify_mode        => "none"
}

We can see using the command watch -n 1 'sudo netstat -tp | grep CLOSE_WAIT | sort' that the number of sockets sitting in the CLOSE_WAIT state continues to climb indefinitely until we reach either the Logstash heap capacity or the maximum number of allowed file descriptors.

Even worse, these connections are NOT successfully cleaned up when issuing a SIGTERM to the logstash process in question. These connections are orphaned and will never be cleaned up until a hard reset of the machine. We can see that in the sample netstat output, each orphaned connection has between 1 and 133 bytes in its receive queue that are never flushed.

This does occur when the pipeline is under little-to-no pressure eventually. Once the following error has been thrown, we start to see the connections climb again.

2017-11-30T02:03:45.088Z DEBUG application:read_stderr:206 b"[nioEventLoopGroup-8-56] WARN    io.netty.channel.AbstractChannelHandlerContext - An exception 'java.lang.NullPointerException' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:\n"
2017-11-30T02:03:45.088Z DEBUG application:read_stderr:206 b'io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem\n'

tcp6 133 0 SM2.node.william.:22223 172.16.164.95:58952 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 GB1.node.william.:44582 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 172.16.110.14:57366 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 172.16.110.14:52972 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 172.16.110.14:51678 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 172.16.110.14:53448 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 172.16.110.14:46798 CLOSE_WAIT tcp6 1 0 SM2.node.william.:22223 GB1.node.william.:42870 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 GB1.node.william.:50626 CLOSE_WAIT tcp6 1 0 SM2.node.william.:22223 172.16.110.14:39048 CLOSE_WAIT tcp6 133 0 SM2.node.william.:22223 172.16.110.14:40578 CLOSE_WAIT

This bug causes our logstash process to crawl and eventually throw errors such as:

"UNEXPECTED POOL ERROR" : No Available Connections

Does anyone have any guidance in regards to this problem? It is greatly affecting our ability to send encrypted data to Logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.