I am looking for some information on data/log loss incase of my below situation.
I have centralized management cluster to which my logstash will be connected with and i have logs coming from beat agents and syslog agents to logstash.
i have 4 logstash nodes pointed 3 elastic master nodes from management cluster and pipelines are being managed from Kibana UI.
I have resiliency in each case where servers are created in different data centers.
I have been using separate instances of haproxy at Input and output flow. and each haproxy will have both primary and DR site connected using Virtual IP (VIP)
I am not writing logs to elasticsearch, I am just using logstash to parse and send logs to output destinations.
Beat agents -> Logstash -> HA Proxy output -> Multiple output destinations
Syslog agents -> HA Proxy input -> Logstash -> HA Proxy output -> Multiple output destinations
So in above architecture, I wanted to know the data loss in case below scenarios.
- When logstash is down - Filebeat will stop sending data and will send older data once the logstash is up running - This is fine for me.
- My main concern is, when VIP is down at Output side - Logstash will not be able to send any data to HA Proxy output . So in this case,
a) Do we have complete data loss ? As the logs are already shipped to logstash from beat agents ?
b) Do we have any impact on logstash nodes or services ?
c) Does logstash retains/saves the data in persistence queues or somewhere untill the VIP comes alive and and data transfer starts?
d) Or there would be complete data loss in this situation ?
Please advice us in my situation and correct me incase if i have missed any.