It's also important to understand that "load balancing" at the node/transport level is a bit different from what you may understand when hearing that word. There's distributing, and load balancing. By distributed, I mean that the documents are hashed and assigned assigned a document id by Elasticsearch, and the shard number determined by the result of a simple mathematical operation on that hash. The log line, or "document" goes to its calculated shard. Distributed, not load balanced, though it may externally appear similar. Indeed it may be considered a form of load balancing due to sharding, though this applies to all documents indexed irrespective of how the documents got there.
protocol => node, Logstash launches a local Elasticsearch client-only node. Logstash puts all requests there, and they are distributed across the shards and nodes in your Elasticsearch cluster.
protocol => transport, Logstash sends the request to an Elasticsearch client node via the transport protocol. That client node distributes across the shards and nodes in your Elasticsearch cluster.
protocol => http, Logstash sends the request to an Elasticsearch client node via the http protocol. That client node distributes across the shards and nodes in your Elasticsearch cluster.
At no point do any of these options actually do load balancing, except when multiple
host entries are present with either
protocol => http or
protocol => transport. Even then, it's strictly round-robin distribution of bulk queries around the clients.