Same HTTP health check succeeds for ALB and failes for NLB


#1

Hi,
I am trying to setup logstash behind a load balancer which seems to be something that has been mentioned by plenty of other people and seems trivial. I have no issues with the health check when I use an ALB where the health check is using the monitoring API on 9600, but I need to send traffic on the beats port which isn't using http so the traffic fails to go through. However, when I try to use an NLB regardless of whether I use TCP health check or an HTTP health check on 9600 or a TCP check on 5044 (for beats), the check always fails.
I have no issues curling or netcat-ing port 9600 from outside the instance.
Does anyone know why this is? What am I missing?

Thanks!


(Ry Biesemeyer) #2

First, the monitoring endpoint is bound to local interfaces only by default for security reasons; the endpoints perform no authentication or authorization, so you expose them to the wider network at your own peril.


What, specifically, are you attempting to load-balance, and how does that map to your pipeline configuration?

Many Logstash input plugins listen for inbound traffic over various protocols, and each input is going to need its own tailor-fit load-balancer configuration matching the protocol's characteristics.

For example, both the HTTP and Beats inputs ultimately operate over TCP, but consider:

  • HTTP: many short-lived connections
  • TCP/Beats: few long-lived connections

Since the load balancer directs each new connection to an available host, the profile of the HTTP input gives it more frequent opportunities to balance the load, where the Beats and raw-TCP inputs have fewer opportunities and are therefore more susceptible to uneven distributions.


#3

Thanks Ry for your response. I am aware of the risks of exposing the monitoring endpoint (or any of the elastic endpoints) and am relying on strict NACLs and security groups with multi-subnet to limit anyone from gaining access. My concern with load-balancing is less about meeting demand and more about ensuring high-availability. I would like to be able to put logstash in an ASG fronted by an ELB so that the instances I am monitoring will always have a logstash available to process logs and forward on.

At this point I have multiple VPCs and will soon have multiple accounts with many instances. These instances have filebeat and metricsbeat shipping logs and metrics to logstash. If I list the instances in beats config with DNS (the best other option as I see it) then I have to manually manage those instances and fix or replace them if something goes wrong and then update DNS.

I understand the issue with beats using persistent connection and load balancing, but for now am less concerned with it.


(Ry Biesemeyer) #4
  • How is the NLB configured to perform a healthcheck against the Logstash API, and how does that fail?
  • Does your NLB have a route to your Logstash host(s)?
  • From a machine that does have a route to your Logstash host(s), what is the output of the healthcheck?

Putting Logstash in an auto-scaling group can have some tricky consequences, especially with inputs that don't apply back-pressure and send acknowledgement of each message or chunk of messages (Beats does, but you'll want to avoid features like Persistent Queues and the Dead Letter Queue, since they rely on the state of files on disk and are therefore vulnerable to data loss each time ASG expunges a node).

In previous scenarios where I've needed failover, we used consul's DNS-based routing; with a super short TTL, the "warm" host would be promoted to "hot" as soon


#5

You were right. I didn't have my security groups configured correctly. This was my first time using an NLBs, and I didn't realize I needed to either allow traffic for everything in the NLB node subnets or explicitly add the node IPs. Thanks for your help and patience.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.