Do we have data loss when logstash is unable to send logs to haproxy as haproxy server is unavailable

muralikrishna · April 5, 2021, 8:56am

Hi All,

I am looking for some information on data/log loss incase of my below situation.

I have centralized management cluster to which my logstash will be connected with and i have logs coming from beat agents and syslog agents to logstash.

My architecture:

i have 4 logstash nodes pointed 3 elastic master nodes from management cluster and pipelines are being managed from Kibana UI.

I have resiliency in each case where servers are created in different data centers.

I have been using separate instances of haproxy at Input and output flow. and each haproxy will have both primary and DR site connected using Virtual IP (VIP)

I am not writing logs to elasticsearch, I am just using logstash to parse and send logs to output destinations.

Log flow

Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations

Syslog agents -> HA Proxy input -> Logstash -> HA Proxy output -> Multiple output destinations

So in above architecture, I wanted to know the data loss in case below scenarios.

When logstash is down - Filebeat will stop sending data and will send older data once the logstash is up running - This is fine for me.
My main concern is, when VIP is down at Output side - Logstash will not be able to send any data to HA Proxy output . So in this case,
a) Do we have complete data loss ? As the logs are already shipped to logstash from beat agents ?
b) Do we have any impact on logstash nodes or services ?
c) Does logstash retains/saves the data in persistence queues or somewhere untill the VIP comes alive and and data transfer starts?
d) Or there would be complete data loss in this situation ?

Please advice us in my situation and correct me incase if i have missed any.

Thanks.

Badger · April 5, 2021, 2:29pm

logstash has an at-least-once delivery model. If an output is unavailable it will queue data, and back pressure will eventually prevent the inputs from reading additional events. If you have persistent queues the data will be queued on disk, if not, it will be queued in-memory and will be lost if logstash is restarted.

muralikrishna · April 5, 2021, 3:44pm

Thanks for the response Badger.

My persistent queues are on disk only.. But will that cause any issue on the logstash service if queue size gradually increases?

So does it stop reading/receiving logs from beat agents if logstash unable to ship to output source ? If, how much time or size will the logstash saves the data in persistent queue before stopped receiving logs from beats?

Badger · April 5, 2021, 3:59pm

Yes. How long it takes will depend on the size of the persistent queue that you have configured.

muralikrishna · April 7, 2021, 8:58am

Thanks for the confirmation @Badger

muralikrishna · April 29, 2021, 2:53pm

Hi Badger,

I am new to haproxy usage, i wanted to check one thing here.

In the below architecture, When output servers (in my case it is SIEM) are down and not able to receive any logs which are sent by logstash through haproxy.

Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations (SIEM)

Does haproxy can keep the log buffer of failed logs just like logstash queues ?
If haproxy can't make the buffer, do we face loss logs as logs are already sent from logstash to haproxy (as haproxy is available) ?

Would be much helpful if you can help me in this. Thanks.

Badger · April 29, 2021, 4:03pm

I do not know how HAProxy behaves.

system · May 27, 2021, 4:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cloud foundry wihh Logstash Logstash	2	578	April 19, 2018
HA ELK Stack lost logs Logstash	3	2664	July 6, 2017
Behaviour with unavailable Gelf output Logstash	5	1514	July 6, 2017
Buffer data in Logstash if Elasticsearch is down Logstash	6	4613	July 6, 2017
Can Logstash lose data? Logstash	4	2217	November 26, 2019

Do we have data loss when logstash is unable to send logs to haproxy as haproxy server is unavailable

Log flow

Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations

Syslog agents -> HA Proxy input -> Logstash -> HA Proxy output -> Multiple output destinations

Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations (SIEM)

Related topics