I have a lot of syslogs flowing from multiple sources into four logstash nodes and then to Elastic. At this time I am just using DNS round robin to balance the Logstash nodes for the inbound syslog traffic.
I want to use something like HAProxy or NGINX to load balance the connection and provide failover with keepalived. I have both running and seem to have issue with Linux omfwd loosing connection and lost records. Neither feel stable. The issue seems worse with HAProxy but is present with both load balancing solutions.
I should also mention that each env (dev,test,qa,prod) are a different pipeline port. This works fine in either solution. Also we have systems using both RFC3164 and RFC5424.
I am looking for example configurations and advice on which way folks are going. Also I am open to other ideas. Our LTM is off limits so I have to go with these solutions or something else. I tried a ring buffer in HAProxy but could not make that work properly as we have both RFC's. Admittedly that might have been just me and my lack of understanding.
Thank you for the reply. I believe I am close, but still I cannot figure it out.
The Logstash side has custom pipelines with separate ports for our specific syslog needs. They work great when we point our syslog sources either directly to the Logstash hosts or to a DNS round-robin IP. With the HAProxy solution, I am using keepalived for failover between two HAProxy servers with identical configuration so the syslog sources point to the keepalived VIP.
The issue is with HAProxy. Moreso with rsyslog where I get the following error and subsequent data loss.
There is a long thread on github about this behavior with rsyslog and multiple tools, I don't think this is restricted to HAProxy.
It seems that the issue originates by the connection being closed between the packets and rsyslog needing to start a new connection, on the thread some people solved it by using keep alive on the receiver side.
There is not much you can do on Logstash side, but there is an option to enable tcp keel alive, you can try adding tcp_keep_alive => true in your input to see if it helps.
This is not an issue with Logstash, so not sure if you will get much help here as this forum is focused on Elastic tool, but maybe someone has a similar scenario and was able to solve it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.