Transport protocol not working for multiple hosts

bennett · September 28, 2015, 8:23pm

can someone tell me why this isn't communicating with my elasticsearch cluster? if i put one host in the host field it communicates but not with multiple.

output {
elasticsearch {
index => "logstash-%{+YYYY.MM.dd}"
host => [ "xxx.xxx.xxx.10", "xxx.xxx.xxx.11", "xxx.xxx.xxx.12", "xxx.xxx.xxx.13", "xxx.xxx.xxx.14", "xxx.xxx.xxx.15", "xxx.xxx.xxx.16", "xxx.xxx.xxx.17", "xxx.xxx.xxx.18" ]
cluster => "cluster1"
port => "9300"
sniffing => true
protocol => "transport"
workers => 5
}
}

magnusbaeck · September 29, 2015, 5:53am

What do the logs say? I'd be surprised if they were silent.

andrewvc · September 29, 2015, 8:30am

Try switching the protocol to 'http', which will be the default in logstash2.0 . Debugging is MUCH easier, and performance is almost identical to transport.

andrewvc · September 29, 2015, 8:32am

Oh, and change the port to 9200 if you do that as well. Working with transport / node is much trickier! I only recommend it for ES experts at this point given the number of ways it can be accidentally misconfigured.

theuntergeek · September 29, 2015, 12:11pm

@bennett

The use of multiple hosts entries (the array, as you have it) should round-robin between different hosts, at least with protocol => http and it should also with protocol => transport. However, as Andrew pointed out, we recommend not using transport protocol, or even node protocol. Testing has shown that the http output is as fast as node and transport, and can even be faster when using round-robining like this. Node protocol will not be the default behavior starting with Logstash 2.0, but rather it will use http. Best to get used to that now.

andrewvc · September 29, 2015, 12:43pm

Often times, sadly, the logs will be silent unless a log4j.properties file is present when using the Transport/Node protocols. One of the reasons we're now recommending HTTP as the default, which does not have this problem

bennett · September 29, 2015, 3:01pm

So i used http protocol and that works just fine. But we were looking into something that could load balance and found the node an transport protocols.

But are you guys saying that http load balances just like the node and transport protocols? and they have the same performance? Are the node and transport protocols going to be deprecated?

theuntergeek · September 29, 2015, 3:11pm

Specifying multiple hosts in your config, e.g. host => ['host1', 'host2', 'host3'] will round robin bulk requests to each of these hosts. It will send 500 events to the first host, then 500 to the second, then the same with the third. As such, it is more efficient because it distributes the client load between 3 clients as opposed to one (each node is a single client). Using multiple hosts with protocol => node will result in multiple nodes being spun up. This is definitely suboptimal and a bad idea.

Distributing via round robin like this will give a performance boost over node protocol. The node client cannot do this. We've measured internally with single node and http clients and found that the http client is at least as performant as node, if not more so in many situations.

The short answer is "probably," though not immediately. We are discussing this internally. It is desired to eliminate the transport protocol in favor of http as http is easier to secure. Node and transport both use the transport protocol (this is the Elasticsearch sense of the word, rather than the Logstash one). Node and transport will persist for now, but it would be wiser to switch to http sooner rather than later.

theuntergeek · September 29, 2015, 3:24pm

It's also important to understand that "load balancing" at the node/transport level is a bit different from what you may understand when hearing that word. There's distributing, and load balancing. By distributed, I mean that the documents are hashed and assigned assigned a document id by Elasticsearch, and the shard number determined by the result of a simple mathematical operation on that hash. The log line, or "document" goes to its calculated shard. Distributed, not load balanced, though it may externally appear similar. Indeed it may be considered a form of load balancing due to sharding, though this applies to all documents indexed irrespective of how the documents got there.

When using protocol => node, Logstash launches a local Elasticsearch client-only node. Logstash puts all requests there, and they are distributed across the shards and nodes in your Elasticsearch cluster.

When using protocol => transport, Logstash sends the request to an Elasticsearch client node via the transport protocol. That client node distributes across the shards and nodes in your Elasticsearch cluster.

When using protocol => http, Logstash sends the request to an Elasticsearch client node via the http protocol. That client node distributes across the shards and nodes in your Elasticsearch cluster.

At no point do any of these options actually do load balancing, except when multiple host entries are present with either protocol => http or protocol => transport. Even then, it's strictly round-robin distribution of bulk queries around the clients.

bennett · September 29, 2015, 3:35pm

Oh wow! Thanks for clarifying and the help! I really appreciate it. This is definitely great and useful information.

Topic		Replies	Views
Elasticsearch Cluster not reachable by Logstash Elasticsearch	18	3052	July 5, 2017
Logstash elastichsearch output in case of ES cluster Logstash	8	5922	July 6, 2017
Logstash posting to one of three ElasticSearch nodes (no connectivity problems) Elasticsearch	5	374	December 23, 2019
Load balance output between multiple hosts Logstash	10	3033	August 15, 2018
Whats different with "protocol node" or "protocol http" in logstash output Elasticsearch	3	1576	July 5, 2017

Transport protocol not working for multiple hosts

Related topics