Do we have data loss when logstash is unable to send logs to haproxy as haproxy server is unavailable

Hi All,

I am looking for some information on data/log loss incase of my below situation.

I have centralized management cluster to which my logstash will be connected with and i have logs coming from beat agents and syslog agents to logstash.

My architecture:

i have 4 logstash nodes pointed 3 elastic master nodes from management cluster and pipelines are being managed from Kibana UI.

I have resiliency in each case where servers are created in different data centers.

I have been using separate instances of haproxy at Input and output flow. and each haproxy will have both primary and DR site connected using Virtual IP (VIP)

I am not writing logs to elasticsearch, I am just using logstash to parse and send logs to output destinations.

Log flow

Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations

Syslog agents -> HA Proxy input -> Logstash -> HA Proxy output -> Multiple output destinations

So in above architecture, I wanted to know the data loss in case below scenarios.

  1. When logstash is down - Filebeat will stop sending data and will send older data once the logstash is up running - This is fine for me.
  2. My main concern is, when VIP is down at Output side - Logstash will not be able to send any data to HA Proxy output . So in this case,
    a) Do we have complete data loss ? As the logs are already shipped to logstash from beat agents ?
    b) Do we have any impact on logstash nodes or services ?
    c) Does logstash retains/saves the data in persistence queues or somewhere untill the VIP comes alive and and data transfer starts?
    d) Or there would be complete data loss in this situation ?

Please advice us in my situation and correct me incase if i have missed any.

Thanks.

logstash has an at-least-once delivery model. If an output is unavailable it will queue data, and back pressure will eventually prevent the inputs from reading additional events. If you have persistent queues the data will be queued on disk, if not, it will be queued in-memory and will be lost if logstash is restarted.

Thanks for the response Badger.

My persistent queues are on disk only.. But will that cause any issue on the logstash service if queue size gradually increases?

So does it stop reading/receiving logs from beat agents if logstash unable to ship to output source ? If, how much time or size will the logstash saves the data in persistent queue before stopped receiving logs from beats?

Yes. How long it takes will depend on the size of the persistent queue that you have configured.

Thanks for the confirmation @Badger

Hi Badger,

I am new to haproxy usage, i wanted to check one thing here.

In the below architecture, When output servers (in my case it is SIEM) are down and not able to receive any logs which are sent by logstash through haproxy.


Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations (SIEM)


  1. Does haproxy can keep the log buffer of failed logs just like logstash queues ?
  2. If haproxy can't make the buffer, do we face loss logs as logs are already sent from logstash to haproxy (as haproxy is available) ?

Would be much helpful if you can help me in this. Thanks.

I do not know how HAProxy behaves.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.