can someone tell me why this isn't communicating with my elasticsearch cluster? if i put one host in the host field it communicates but not with multiple.
Try switching the protocol to 'http', which will be the default in logstash2.0 . Debugging is MUCH easier, and performance is almost identical to transport.
Oh, and change the port to 9200 if you do that as well. Working with transport / node is much trickier! I only recommend it for ES experts at this point given the number of ways it can be accidentally misconfigured.
The use of multiple hosts entries (the array, as you have it) should round-robin between different hosts, at least with protocol => http and it should also with protocol => transport. However, as Andrew pointed out, we recommend not using transport protocol, or even node protocol. Testing has shown that the http output is as fast as node and transport, and can even be faster when using round-robining like this. Node protocol will not be the default behavior starting with Logstash 2.0, but rather it will use http. Best to get used to that now.
Often times, sadly, the logs will be silent unless a log4j.properties file is present when using the Transport/Node protocols. One of the reasons we're now recommending HTTP as the default, which does not have this problem
So i used http protocol and that works just fine. But we were looking into something that could load balance and found the node an transport protocols.
But are you guys saying that http load balances just like the node and transport protocols? and they have the same performance? Are the node and transport protocols going to be deprecated?
Specifying multiple hosts in your config, e.g. host => ['host1', 'host2', 'host3'] will round robin bulk requests to each of these hosts. It will send 500 events to the first host, then 500 to the second, then the same with the third. As such, it is more efficient because it distributes the client load between 3 clients as opposed to one (each node is a single client). Using multiple hosts with protocol => node will result in multiple nodes being spun up. This is definitely suboptimal and a bad idea.
Distributing via round robin like this will give a performance boost over node protocol. The node client cannot do this. We've measured internally with single node and http clients and found that the http client is at least as performant as node, if not more so in many situations.
The short answer is "probably," though not immediately. We are discussing this internally. It is desired to eliminate the transport protocol in favor of http as http is easier to secure. Node and transport both use the transport protocol (this is the Elasticsearch sense of the word, rather than the Logstash one). Node and transport will persist for now, but it would be wiser to switch to http sooner rather than later.
It's also important to understand that "load balancing" at the node/transport level is a bit different from what you may understand when hearing that word. There's distributing, and load balancing. By distributed, I mean that the documents are hashed and assigned assigned a document id by Elasticsearch, and the shard number determined by the result of a simple mathematical operation on that hash. The log line, or "document" goes to its calculated shard. Distributed, not load balanced, though it may externally appear similar. Indeed it may be considered a form of load balancing due to sharding, though this applies to all documents indexed irrespective of how the documents got there.
When using protocol => node, Logstash launches a local Elasticsearch client-only node. Logstash puts all requests there, and they are distributed across the shards and nodes in your Elasticsearch cluster.
When using protocol => transport, Logstash sends the request to an Elasticsearch client node via the transport protocol. That client node distributes across the shards and nodes in your Elasticsearch cluster.
When using protocol => http, Logstash sends the request to an Elasticsearch client node via the http protocol. That client node distributes across the shards and nodes in your Elasticsearch cluster.
At no point do any of these options actually do load balancing, except when multiple host entries are present with either protocol => http or protocol => transport. Even then, it's strictly round-robin distribution of bulk queries around the clients.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.