Load balancing Shipper\Indexer

Hello,
I am using below structure of event processing pipeline in my project.
LSF-->logstash-shipper(Scaled)--->Redis--->Indexer(Scaled)--->NGINX--->Multiple nodes ES

Here, I want to make Shipper and indexer highly available.I was referring to below link https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
I have doubt in diagram 7 on the page.


I am not clear how load balancing is achieved here? Its scaling what has shown in the diagram. What takes care of load balancing in this diagram? Like in my case nGinx's upstream declaration load balances ES cluster and also prevents duplicates.
Can anybody please elaborate diagram and answer my questions?

If LSF1 ships logs to shipper-instance-1 then how can I make this instance highly available?
in case of failure of instance I want to make available similar instance which takes logs from LSF1 but never creates duplicate messages in messageQ. How this can be achieved?

br,
Sunil

If you look two paragraphs above the diagram it says:

To make your Logstash deployment more resilient to individual instance failures, you can set up a load balancer between your data source machines and the Logstash cluster. The load balancer handles the individual connections to the Logstash instances to ensure continuity of data ingestion and processing even when an individual instance is unavailable.

I think the diagram just doesn't show the load balancer they talk about. We use something similar to this. Just switch web servers for Logstash. The nice thing about this is that your data sources don't have to care what your logstash instances are named, or how many you have. You can add and remove instances without having to touch your data sources.

If you want un-duplicate message, the easiest solution is to use elasticsearch unique "_id" thing. Shipper must create an unique ID from your message (maybe a hash of something), then even if 2 indexers index same document, elasticsearch will not duplicate it.

You don't need Nginx before elasticsearch, since logstash ES output plugin can take care of this (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-hosts).

Load balancing can be achieved by putting several indexer instances. I recommend to use RabbitMQ instead of Redis, much better queue, with a clean web UI where you can track queue stats (input/output rate) and spawn indexer instances on-demand!