Logstash pipeline High avliliblity

m3bgwad · December 25, 2022, 2:01pm

hello Everyone,

How to achieve The Logstash Pipeline HA?
I have 3 Nodes if any node opening count of pipeline after that the node is down what happen?
what is the solutions to achieve the HA?

leandrojmp · December 25, 2022, 3:23pm

Logstash does not have any builtin features to enable HA and each logstash instance is independent from each other.

To have some kind of HA with Logstash you will need to use third-party tools like HAProxy for load balance the requests between your nodes.

m3bgwad · December 26, 2022, 6:59am

In case the pipeline was down there is no solutions except third-party load balancer?

leandrojmp · December 26, 2022, 12:09pm

As I said, Logstash nas no HA built-in, to have some kind of HA you need third-party tools lika a load balancer.

If a Logstash pipeline goes down for some reason, you will need to check the reason and fix it, depending on the way you are collecting data, having multiple nodes behind a Load Balancer can help you, but you still need to check the reason and fix it.

m3bgwad · December 26, 2022, 12:57pm

I understand you but in case the Logstash pipline goes down and I caption large mount of data par second on the UDP Port number, it is unreasonable?!!, I will lost a large mount of data in this case if happened down!!.

leandrojmp · December 26, 2022, 1:12pm

You will need to configure your sources to send to the load balancer that will then send to your Logstash nodes, but you still may have some kind of data loss.

This is expected, even more when using UDP.

m3bgwad · December 26, 2022, 1:26pm

So I want to handle this case
I put more one solution to over come this case

run same pipeline in each node I will face duplication of data because UPD protocol don't send acknowledgment
Run cron job script to run pipeline in case the other pipeline goes down I will face lost of data in the time that will take to open another pipeline in the other node.

are you idea to solve any thing?

leandrojmp · December 26, 2022, 2:00pm

Not sure how this would fix this issue. What will send data using UDP? It is a network device? Depending on the source you may not be able to send data to multiple locations.

This will not solve your issue, the minimun interval you have per default for a cronjob is 1 minute and when you add a new pipeline to logstas it takes some time to start-up, you may end losing data for a couple of minutes in this case.

A lod balancer like HAProxy replaces the two things you proposed above, you can configure it to send to multiple logstash instances without duplicating data and it checks if an instance is down and stop sending data to it way fast than a cronjob.

Depending on what is sending the data, you may put Kafka in between to uncouple the shipping of the data and the consumption of the data.

This is what I use, I have a load balancer that sends the data to two logstash instances where the only function is to send data to Kafka Topics, then I have other logstash instances consuming the data from those Kafka Topics.

Having no data loss in this use case is really not an easy problem to solve, you may end with a lot of pieces in your infrastructure and spend a lot of money and still get some data loss.

m3bgwad · December 28, 2022, 12:31pm

You mean running the same Logstash pipeline in more one different node and so too the same index and then use load balancer to send data to 1st node once it is goas down load balancer move it the 2nd node?

leandrojmp · December 28, 2022, 1:09pm

It depends of what you want, you can configure a load balancer to work this way, have one active back server and only send data to other servers in case of failure.

But you can also configure it to distribute the request between the servers.

m3bgwad · December 28, 2022, 1:16pm

This approach doesn't cause the duplicate of data?

leandrojmp · December 28, 2022, 1:24pm

No, this is how load balancers workers, you configure it to distribute request between the servers using some algorithm like round robin or least connections.

For round robin for example, a request goes to server 1, the next request goes to server 2, the following one goes to server 3, if you have only 3 servers, the following request will go to server 1 and etc

Badger · December 28, 2022, 3:49pm

If clients are using keepalive connections then the load balancer is actually distributing connections, not requests. There are cases where that is important, such as where a server goes down, and all of the clients establish new connections before it comes back up. It will then get zero requests until one of the persistent connections times out.

Tiharqa · December 31, 2022, 1:30am

as mentioned you could usea kafka cluster and use the same configs on all 3 logstash instances if you pulling from the same topics which could achieve some degree of high availability and uncouple
shipping and distribution of the data unless you want to go for a load balancer option .. i prefer kafka

system · January 28, 2023, 1:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash HA Logstash	6	657	January 28, 2021
Logstash HA Logstash	1	329	November 21, 2019
Logstash Database pipeline resiliency Logstash	4	259	February 6, 2021
How to make Logstash highly available Logstash	6	16034	July 6, 2017
Multiple logstash Logstash	4	1022	November 21, 2020

Logstash pipeline High avliliblity

Related topics