Logstash crashing with "Too many files open"

(Vincent Tran) #1

I have two logstash nodes running with 8 cores each. However, they consistently crash (every 1 or 2 days) with "Too many files open". I've only been able to find cases of this message on ES, has anyone encountered this on logstash?

-bash-4.2# ulimit -n

I worry that this has to do with the performance of my grok filters. Any help or input on how to proceed to investigate ore remedy would be appreciated.

(Steffen Winther Sørensen) #2

try to track number of open FDs:

ls /proc//fd/ | wc -l

maybe you need more logstash resources (CPU, open files descriptors, memory, NIC bandwidth...) f.ex. from more nodes running your filtering.

Try to track and verify that your logstash uses that many fds (but most do as the error shows :), so try to raise the ulimit for this process/user/system (see this link)or add more resources ie. more processes, if you got cpu and memory maybe more processes on same node listen on different input ports or add a MQ in front of your logstash nodes...

(Vincent Tran) #3

I have ran logstash with this in /etc/sysconfig/logstash


However, logstash eventually crashed (after 3-4 days) with the same "Too many files open" message in logstash.err. So it seems that increasing the open files limit only prolong the inevitable.

(Matthew Prinvale) #4

I'm having the same issues here.

Everything latest (Logstash 2.1.0, ES 2, etc). I noticed Logstash was crashing after a while even after changing my ulimit from 1024 to 64,000. I was thinking about changing it to unlimited but this look to be a bug so I'm glad I didn't.

When I did an lsof I saw tens of thousands of these:

`<beats     3311 3575   logstash  196u     IPv6            6016017      0t0        TCP redacted:33002->redacted:9200 (ESTABLISHED)
LogStash:  3311 3572   logstash   42u     IPv6            5071865      0t0        TCP redacted:32880->redacted:9200 (ESTABLISHED)

it seems like Logstash isn't closing them. Is there a work-around until a fix is in place?

(Vincent Tran) #5

I fixed it by turning off sniffing (in elasticsearch output). logstash was keeping the tcp sockets open and consuming fd rapidly.

(Matthew Prinvale) #6

nice! I just checked and mine is most certainly enabled (true). Was are the complications for changing this boolean?

(Vincent Tran) #7

It allows logstash to "sniff" the cluster and discover other ES nodes that it can potentially use to forward events to. By disabling it, you are telling logstash to only use the ES nodes specified in the hosts field. It's probably not all that useful unless your ES cluster is super active in term of horizontal scaling (i.e. you add and remove ES nodes a lot).

(Matthew Prinvale) #8

Great! I'm pointing to a load balancer anyways! Glad I'm working this out in dev! haha. I made the change so fingers crossed

(Vincent Tran) #9

No need to cross fingers. You can confirm as soon as logstash is restarted with sniffing disabled.

lsof | grep "logstash.*TCP" | wc -l

Run it a few seconds apart (about 5s would do). If the number is not growing rapidly, you are golden.

(system) #10