Too many open files. Logstash 6.1


(Иван Бубнов) #1

Hey, folks! Please help me solve this problem.
Logstash 6.1
Have this in logs:
......
[2017-12-25T21:23:21,010][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.8.0_151]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[?:1.8.0_151]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[?:1.8.0_151]
at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135) ~[logstash-input-tcp-5. 0.2.jar:?]
at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:75) [logstash-input-tc p-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:571) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:512) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:426) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:398) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:877) [logstash-input-tcp-5.0.2.jar :?]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) [logstash-input- tcp-5.0.2.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2017-12-25T21:23:22,013][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.8.0_151]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[?:1.8.0_151]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[?:1.8.0_151]
at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135) ~[logstash-input-tcp-5. 0.2.jar:?]
at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:75) [logstash-input-tc p-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:571) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:512) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:426) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:398) [logstash-input-tcp-5.0.2.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:877) [logstash-input-tcp-5.0.2.jar :?]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) [logstash-input- tcp-5.0.2.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
......

after this logstash is dying. Every 1-2 hours.

ulimit -n = 500000 | for service MAX_OPEN_FILES = 500000
LS config is:
path.data: /var/lib/logstash
pipeline.workers: 4
pipeline.output.workers: 4
path.config: /etc/logstash/conf.d/*.conf
config.reload.automatic: true
config.reload.interval: 3600
path.logs: /var/log/logstash

2 TCP inputs with json codec, 8 conditions in filter (just mutate type field if in JSON present key-field), 8 conditions in output section ( checking type and send to ES by elasticsearch-output-plugin with special index, sniffing disabled(!))

I was forced to create a crontab task to restart logstash every hour! Now its working, but its not a solution.


(Vlad Vasiliu) #2

Hello,

I have been having the same problem since I upgraded to Logstash 6.
Nothing changed in the configuration and in the usage between the upgrades. At the moment the whole pipeline is version is 6.1.1.
The platform is running on AWS EC2 on an up to date Ubuntu 16.04.3 / Linux 4.4.0-104-generic.

Layout is as follows:

Producer 1 ---             --- logstash 1 ---             --- ElasticSearch 1
...           |--- ELB ---|                  |--- ALB ---|--- ElasticSearch 2
Producer N ---             --- logstash 2 ---             --- ElasticSearch 3

There is an ALB between Logstash and ES because they are in different availability zones and the ALB does SSL termination.
Producers are web frontends and are auto-scaled.

Here is the logstash configuration. All other settings are at default values: https://pastebin.com/v2BZEZWS

The pipeline is here: https://pastebin.com/W0gBUQ6j

I suspect this problem is related to the TCP input. I have to other Logstash servers running in another AZ and they don't have any problems, their TCP connections stay low. They are running the exact same configuration, only difference is nothing uses their TCP inputs. ElasticSearch output is done via the same ALB. They receive their inputs via an ELB too.

There are no related errors in the logstash logs: https://pastebin.com/b77xiGqN

I have enabled trace logging for the TCP input and ES output as per https://www.elastic.co/guide/en/logstash/current/logging.html but I don't get anything in the log file. There are however events being processed, around 300/s, mostly via the TCP inputs.

Below are the TCP connection graphs, as seen by Linux on the Logstash server as well as the ELB graphs from AWS.
The server only runs logstash and prometheus-node-exporter. TCP connection timezone is UTC+1 whereas ELB stats are UTC.

The server was last restarted yesterday around 9:00 UTC+1. Both servers exhibit the same rise in TCP connections around the same time.

Below is processed events for the same server as taken from X-Pack monitoring (times are UTC+1):

Below is the connection rate on the ELB in front of the producers. This is a proxy for producer activity (times are UTC):

Best regards,
Vlad


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.