Hi,
I have 60+ clients that I am trying to connect to my logstash instance. They are sending logs on a short interval using LSF (Lumberjack).
Versions:
Logstash 2.0
Elasticsearch 2.0
Kibana 4.2
Logstash-forwarder 0.4.0
Logstash setup:
port 5000 used to receive from all LSF data.
20 workers
5g heap
input = lumberjack on port 5000
filters = basic grok and timestamp filters
output = file with filters + elasticsearch
The problem:
When I start up logstash all of the clients establish their connections but after a delay period (~15 seconds, the default LSF timeout period) the majority of them change state to CLOSE_WAIT or SYN_RECV. A handful remain ESTABLISHED and continue to send data without any problems. This is usually only 5-9 connections that remain ESTABLISHED at best and processing correctly (Not always the same connections out of the 60 that remain so I have crossed off specific LSF clients being the problem)
There is data in the RECV_Q of the CLOSE_WAIT connections. i.e. RECV_Q is not zero.
The CLOSE_WAIT and SYN_RECV connections have no PID attached (PID is '-' when running netstat -p as root) so I cannot close them manually as far as I am aware.
The SYN_RECV connections on the elk server are ESTABLISHED on the client side and have data in their SEND_Q
logstash logs intermittently shows ":message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection."
If this occurs I simply restart logstash. It does not happen consistently. I don't notice any discernible difference between runs when this line appears in the log and when it does not.
Doing a tcpdump on port 5000 shows constant traffic on the port when there shouldn't be between logging intervals. I am assuming it is clients attempting to connect.
Every connection that isn't ESTABLISHED on the main elk server usually has multiple duplicate connections showing on netstat..
e.g. 1 client could have 3 CLOSE_WAIT connections with data in their RECV_Q and a SYN_RECV
LSF is installed on the main elk server and is usually in the ESTABLISHED state but there is usually data in it's SEND_Q and it is not being processed.
Before LSF was rolled out to all clients it was tested using 5 clients of different OS and worked fine consistently.
Originally I was running logstash using default config. After increasing increasing heap from 500m -> 5g I noticed an increase in the ESTABLISHED connections initially and also when I increased the worker counter but it normally kills of connections eventually and stabilizes around 3 ESTABLISHED.
Can logstash handle this many connections?
Any help on this would be much appreciated.
Thanks.