We have an ELK cluster with 3 hosts on Ubuntu 14.04 (logstash 2.4.1+ES 2.4.1 on each host in Docker containers), with logstash configured to receive gelf and syslog/tcp logs.
Sometimes some gelf logs are lost.
With tcdump on the server, I see that the UDP packet reaches the host
with netstat -c --udp -an | grep 12201 I see the udp queue is always empty
logstash is using nearly no CPU (~15% of a core)
the index queue of ES is empty
the volume of logs saved to ES is pretty low (<1000/s)
The problem with UDP, it's not a lossless protocol and you may lose data. In your TCPDUMP you see the packet sent from the client, but the server will never ack (unlike SYN-Ack in TCP).
Sometimes some gelf logs are lost.
How many packet are loss? How you calculate or approximate that?
What version of Logstash you are running?
What is the server OS?
There are a few things to look for to reduce that data loss:
Make sure LS handle the log and is not blocked ( which appear to be the case in your description)
It's possible to tweak the UDP buffer on the OS Level to help with data loss, I don't have a lot of experience in that case.
@pierhugues tcpdump was running on the server, not on the client.
I don't know how many logs I lose, I just try to sent 1 packet, and sometimes it appears in Elasticsearch, sometimes not.
I run Logstach 2.4.1 in a Docker container on Ubuntu 14.04 (Linux 4.4.0).
Logstash does not seem to handle the log (it does not appear in ES nor on stdout), but handle other logs before and after the lost one.
I already increased rmem_max to 256 MB, but it does not seem to change anything.
@pierhugues with Syslog over TCP, I don't have this issue. I thought it was network related, but according to tcpdump, the UDP packets always reach the host where Logstash is located.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.