Filebeat Load Balancing to Docker Logstash Containers

Greetings:

I have filebeat configured to load balance to 12 logstash docker containers on a different server. They all share the same IP address but utilize different ports. I only ever see one instance that is processing data from filebeat - the other 11 are always idle.

Is filebeat able to load balance across the same IP with different ports?

Here's my output section from filebeat.yml:

output.logstash:
  hosts:  ["10.0.0.99:5044","10.0.0.99:5045","10.0.0.99:5046","10.0.0.99:5047","10.0.0.99:5048","10.0.0.99:5049","10.0.0.99:5050","10.0.0.99:5051","10.0.0.99:5052","10.0.0.99:5053","10.0.0.99:5054","10.0.0.99:5055"]
  loadbalance: true

On the logstash side, all 12 of the docker containers use the same pipeline. In the input section I have this:

input {
	beats {
		port => 5054
	}
	beats {
		port => 5045
	}
	beats {
		port => 5046
	}
	beats {
		port => 5047
	}
	beats {
		port => 5048
	}
	beats {
		port => 5049
	}
	beats {
		port => 5050
	}
	beats {
		port => 5051
	}
	beats {
		port => 5052
	}
	beats {
		port => 5053
	}
	beats {
		port => 5044
	}
	beats {
		port => 5055
	}
}

Since I couldn't ever find any specific examples of how to craft the pipeline for logstash I took a wild guess to add multiple beats inputs for all 12 ports.

If anyone can help me out I would appreciate it.

Kindest Regards

Ken

Hi @Krog,

Your output configuration for filebeat looks fine, but I don't completely understand why do you need to open all ports in all logstash instances. Being that they are all in the same machine, would it be possible that only one of them is really listening in all the ports in 10.0.0.99?

That is the crux of my problem - I can't find any examples online of how to set up the logstash instances when filebeats is using the loadbalance settings.

One of my colleagues suggested that logstash can be clustered, i.e. have two clusters with six instances of logstash each, and the logstash instances within each cluster all talk to each other rather than filebeat to each individual logstash instance (the only information I found online regarding clustering logstash was it doesn't support it - but that was from 2017, I did not find anything else on that topic from this year). While that looks good on paper, how to actually build and implement it doesn't seem to exist anywhere that I can find. Even your own documentation shows multiple instances of logstash but there's no details anywhere (that I can find at least) on how to actually configure them.

So I guess what I am really looking for is the logstash side of setting up up filebeat for loadbalancing in a docker environment.

Thanks for the response - if there's some documentation or an actual deployment of a loadbalanced filebeat point to multiple logstash instances you can point me to that will help I'd appreciate it.

Ken

No special configuration is needed on the logstash side, if you deploy multiple instances, usually you want them to have just the same configuration, they don't need any special configuration to be used with Beats load balancing settings.

Usually you don't have multiple logstash instances in the same host, to scale up you deploy more servers with the same configuration and add them to the hosts lists in your clients. If you see that a logstash instance could be using more CPU you can increase the number of workers with the pipeline.workers option.

There is no logstash "clustering". What you can have is multiple groups of logstash servers if they have different use cases, in this case you may have different configurations, one for each use case, but all instances in the same group will still have the same configuration. You can also have more complicated architectures with multiple levels of logstash instances, but this is usually not needed to scale.

Hi @jsoriano

That's pretty much what I thought. The only other question I have is regarding the pipeline conf file.

My input section looks like this:

input {
	beats {
		port => 5054
	}
	beats {
		port => 5045
	}
	beats {
		port => 5046
	}
	beats {
		port => 5147
	}
	beats {
		port => 5048
	}
	beats {
		port => 5049
	}
	beats {
		port => 5050
	}
	beats {
		port => 5051
	}
	beats {
		port => 5052
	}
	beats {
		port => 5053
	}
	beats {
		port => 5044
	}
	beats {
		port => 5055
	}
}

Is this appropriate or is there a better way to do it?

Ken

In principle you don't need to open more than one port per instance.

I don't think I am...? Here's my docker-compose file (note: Due to the 7,000 character limit on posts, I only show the first 4 instances of logstash here):

version: '2'
services:
  logstash1:
    container_name: logstash_01
    volumes:
      - '/home/logstash/settings/logstash.yml:/usr/share/logstash/config/logstash.yml'
      - '/home/logstash/settings/pipelines.yml:/usr/share/logstash/config/pipelines.yml'
      - '/home/logstash/settings/log4j2.properties:/usr/share/logstash/config/log4j2.properties'
      - '/home/logstash/pipeline/:/usr/share/logstash/pipeline/'
      - '/var/log/logstash/:/var/log/logstash/'
      - lsqueue01:/usr/share/logstash/queue/
    ports:
      - "5044:5044"
    expose:
      - 5044/tcp 
    image: 'docker.elastic.co/logstash/logstash:6.2.4'
    environment:
      - HOSTNAME=logstash_01

  logstash2:
    container_name: logstash_02
    volumes:
      - '/home/logstash/settings/logstash.yml:/usr/share/logstash/config/logstash.yml'
      - '/home/logstash/settings/pipelines.yml:/usr/share/logstash/config/pipelines.yml'
      - '/home/logstash/settings/log4j2.properties:/usr/share/logstash/config/log4j2.properties'
      - '/home/logstash/pipeline/:/usr/share/logstash/pipeline/'
      - '/var/log/logstash/:/var/log/logstash/'
      - lsqueue02:/usr/share/logstash/queue/
    ports:
      - "5045:5045"
    expose:
      - 5045/tcp 
    image: 'docker.elastic.co/logstash/logstash:6.2.4'
    environment:
      - HOSTNAME=logstash_02

  logstash3:
    container_name: logstash_03
    volumes:
      - '/home/logstash/settings/logstash.yml:/usr/share/logstash/config/logstash.yml'
      - '/home/logstash/settings/pipelines.yml:/usr/share/logstash/config/pipelines.yml'
      - '/home/logstash/settings/log4j2.properties:/usr/share/logstash/config/log4j2.properties'
      - '/home/logstash/pipeline/:/usr/share/logstash/pipeline/'
      - '/var/log/logstash/:/var/log/logstash/'
      - lsqueue03:/usr/share/logstash/queue/
    ports:
      - "5046:5046"
    expose:
      - 5046/tcp 
    image: 'docker.elastic.co/logstash/logstash:6.2.4'
    environment:
      - HOSTNAME=logstash_03

  logstash4:
    container_name: logstash_04
    volumes:
      - '/home/logstash/settings/logstash.yml:/usr/share/logstash/config/logstash.yml'
      - '/home/logstash/settings/pipelines.yml:/usr/share/logstash/config/pipelines.yml'
      - '/home/logstash/settings/log4j2.properties:/usr/share/logstash/config/log4j2.properties'
      - '/home/logstash/pipeline/:/usr/share/logstash/pipeline/'
      - '/var/log/logstash/:/var/log/logstash/'
      - lsqueue04:/usr/share/logstash/queue/
    ports:
      - "5047:5047"
    expose:
      - 5047/tcp 
    image: 'docker.elastic.co/logstash/logstash:6.2.4'
    environment:
      - HOSTNAME=logstash_04

volumes:
  lsqueue01:
    driver: local
  lsqueue02:
    driver: local
  lsqueue03:
    driver: local
  lsqueue05:
   driver: local
 lsqueue06:
   driver: local
 lsqueue07:
   driver: local
 lsqueue08:
   driver: local
 lsqueue09:
   driver: local
 lsqueue10:
   driver: local
 lsqueue11:
   driver: local
 lsqueue12:
   driver: local

And here is my logstash.yml file:

node.name: ${HOSTNAME}
queue.type: persisted
log.level: debug
path.logs: /var/log/logstash
xpack.monitoring.elasticsearch.url: ["http://10.0.0.100:9200"]

All 12 instances are using the same master.conf file (that's where the input section I already posted is from).

It's now working like this, I just wanted to know if I am doing this the really convoluted hard way or is there something I missed that would simplify the whole thing.

Ken

1 Like

Oh, I see, even if you open 12 ports in all containers you only expose one by one. You could still have only one beats input with the default configuration and change the ports forwarding configuration in the docker-compose file, so in logstash1 you have ports: ["5044:5044"], in logstash2 ports: ["5045:5044"] and so on.

But indeed this looks a bit convoluted :slight_smile: I still wonder why you are trying to start so many logstash instances in the same machine. It should be only needed if you have slow outputs or some slow filters (using grok for example). In general to scale you could start by trying to increase the number of workers and only if this can still not handle all the load then try to start more instances.

I have only been using ELK for about the last five months or so. I was tasked with bringing up an ELK stack for an internal project at work, and until then I'd never heard of Elasticsearch. So the 'convoluted' method is due to my lack of experience with the product. Add to that the fact that I only started using X-Pack last week to monitor the stack, and I am still working through interpreting what all the graphs and charts.

I am at a point in the development loop where I'm trying to improve performance, and that's also a new topic for me. Your advice to increase the workers rather than adding instances is well taken - that will be my focus now. From what I've read there's a lot of ways to slice and dice performance settings - hopefully I'll be able to get something deployed soon that will work for our teams.

Thank you very much for your help on this - I really appreciate it.

My next task is to open a new forum topic to help me figure out why I am seeing _rubyexception errors. So much to learn!

Cheers!

Ken

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.