How cluster logstash?

I want to setup the logstash configuration described in the docs, but I don't understand how to set it up. Here's the sentence I'm struggling with:

you can set up a load balancer between your data source machines and the Logstash cluster

Yeah, ok, sounds great. But right now I've configured my one Logstash Shipping Instance to ship data from one logfile and I start it with
sudo service logstash start</code

What is this load balancer thing? How do I load balance a logfile? And how do I create/configure a logstash cluster? I found the documentation very easy to follow along up until this last section where there are no links to further explain how it's done. Frustrating.

My goal is to set up the structure in the very last image of that document. Multiple logstash shipper instances (logstash cluster) that work together in reading the log file and send to an MQ. The second part, read from MQ and send to ES-cluster is already done. But the shipper-configuration I have today keeps crashing. So if I have multiple instances it wont stop working just because one instance crashes.

This is something external to Logstash. It could be a hard or software loadbalancer, it could be haproxy or nginx or apache. It could be an AWS ELB.

OK. I feel like I'm missing something.

I've attached a modified setup, which is a subset of the example from the documentation.

  • How do I get multiple logstash instances (start many on one EC2 instance or launch many EC2 instances with one logstash on each)? Is this what is meant by a cluster?
  • How do I get multiple logstashes to ship from the same local logfile, without interfering with each other?

You need to start multiple processes, you can do that on different hosts or on the same one, up to you.
That would be a cluster of processes, but not a real cluster, as they do not talk to each other.

For the second one, you probably don't want to do this as you may end up duplicates.

OK, so what you're saying is basically that my particular use case doesn't scale beyond this configuration, right?

It's difficult to say.
Can you take a step back and, without worrying about architecture, explain what you want to achieve?

Certainly. The problem is that our logstash keeps crashing. The error is

Exception in thread ">output" java.lang.UnsupportedOperationException
    at java.lang.Thread.stop(Thread.java:869)
    at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
    at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
    at java.lang.Thread.run(Thread.java:745)

It seems to happen randomly. I was thinking that the logstash shipping is a single point of failure, so if I could make a cluster it would be more resilient to these crashes, we would be able to restart the logstash-instances if they crashed before the crash of a single logstash-instance had any impact on the log-shipping.

So, I want to achieve logstash shipping that works. My solution for this was to remove any single points of failure. Another solution may be to figure out what's causing the error and just have the single logstash-instance work reliably. But this seems to be a bug i logstash. So I need to work around that.

Ok, so go - source(s) > LS > Broker > LS > ES.

Then you can run multiple instances of the second LS to process things.

Yeah, but the second LS isn't the problem, that's been working flawlessly. It's the first LS that's crashing.