Need to load balance Logstash

The use of self-generated IDs is the option I would also recommend. However I would point you to the UUID filter as an option. This may need to be added after installing Logstash.

A common architecture combining Logstash and Kafka is:

collect (apply UUID here) --> Kafka --> processing --> Kafka --> outputs

Each of these tiers can be scaled independently for performance or redundancy.

While there is an indexing efficiency penalty, with self-generated IDs the benefit when using them with Kafka is that you can at any point "replay" the data through updated pipelines and easily replace existing documents in Elasticsearch.