Need to load balance Logstash

seanziee · June 10, 2021, 8:04pm

I'm currently running a single logstash and interested in creating several for resilience in case one goes down, and for improving performance. I'm ingesting data currently with the http input plugin.

I'm thinking about using the Kinesis input plugin to accomplish this and from my understanding (from reading past posts) if I set the application_name within the logstash input to the same value on two separate machines, that will automatically work as a load balancer. Am I correct?

Will it both distribute the load between the 2 logstashs and also handle if one logstash goes down?

leandrojmp · June 10, 2021, 9:15pm

A common way to do this is using a message queue like Kafka.

You would send your data to Kafka and configure your logstash nodes to read from Kafka using the same group id, this way if one logstash node goes down, the others would keep consuming.

But this depend on how you ingest data and if you can change the way that you ingest data.

seanziee · June 10, 2021, 9:31pm

How about doing it with kinesis? For our infrastructure kinesis makes the most sense for us. Does it work in the same way?

leandrojmp · June 10, 2021, 9:41pm

Never used Kinesis, so I'm not sure if it will do what you want.

In the case of the kafka input in logstash you have a option that will tell the Kafka brokers the the all the consumers are part of the same consumer group, so if one of the nodes goes down the other ones will get the messages and you will not have duplicated messages.

If the application_name option in Kineses works the same way, then it will load balance the messages between your nodes withouth duplicating, but you will need to test this.

rcowart · June 12, 2021, 12:40pm

This is only true if everything supports and is configured for "exactly-once" semantics. Otherwise Kafka provides only "at least once" guarantees. Under normal circumstances you will not have duplicates, but it is possible.

seanziee · June 12, 2021, 2:32pm

And how can I reduce the likelihood of duplicates or what can I do about it?

leandrojmp · June 12, 2021, 3:38pm

The avoid having duplicates you would need to use a self-generated unique id for your documents.

If your original documents already have an unique id, then you can use this id as the document _id field in elasticsearch.

If they do not have an unique id, then you can create one combining some fields using the fingerprint filter.

In both cases you would need to use the option document_id in your elasticsearch ouput in logstash.

You can read more in this blog post by elastic.

Anyway, the likehood of having duplicates when using Kafka and Logstash is pretty low, I've been running many pipelines with this configuration and never faced a situation that would cause a duplicated message.

But as rcowart said, It can happen.

In my case I did not have the need in all those years to tackle an inexistent problem for me at the moment.

seanziee · June 12, 2021, 4:41pm

Awesome thanks for the response. For my use case, 95% accuracy is what's needed so a few duplicates won't make or break us thanks for the help!

rcowart · June 13, 2021, 6:45am

The use of self-generated IDs is the option I would also recommend. However I would point you to the UUID filter as an option. This may need to be added after installing Logstash.

A common architecture combining Logstash and Kafka is:

collect (apply UUID here) --> Kafka --> processing --> Kafka --> outputs

Each of these tiers can be scaled independently for performance or redundancy.

While there is an indexing efficiency penalty, with self-generated IDs the benefit when using them with Kafka is that you can at any point "replay" the data through updated pipelines and easily replace existing documents in Elasticsearch.

system · July 11, 2021, 6:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Load balancing using logstash Logstash elastic-stack-monitoring	3	511	April 8, 2024
How to load balance data coming from various applications to logstash Logstash	7	451	December 28, 2022
Logstash Bulk load balancing Logstash	1	598	January 19, 2017
Load balancing Shipper\Indexer Logstash	3	2327	July 6, 2017
Improve Logstash data resiliency Logstash	5	498	December 17, 2021

Need to load balance Logstash

Related topics