I have filebeat configured to load balance to 12 logstash docker containers on a different server. They all share the same IP address but utilize different ports. I only ever see one instance that is processing data from filebeat - the other 11 are always idle.
Is filebeat able to load balance across the same IP with different ports?
On the logstash side, all 12 of the docker containers use the same pipeline. In the input section I have this:
input {
beats {
port => 5054
}
beats {
port => 5045
}
beats {
port => 5046
}
beats {
port => 5047
}
beats {
port => 5048
}
beats {
port => 5049
}
beats {
port => 5050
}
beats {
port => 5051
}
beats {
port => 5052
}
beats {
port => 5053
}
beats {
port => 5044
}
beats {
port => 5055
}
}
Since I couldn't ever find any specific examples of how to craft the pipeline for logstash I took a wild guess to add multiple beats inputs for all 12 ports.
Your output configuration for filebeat looks fine, but I don't completely understand why do you need to open all ports in all logstash instances. Being that they are all in the same machine, would it be possible that only one of them is really listening in all the ports in 10.0.0.99?
That is the crux of my problem - I can't find any examples online of how to set up the logstash instances when filebeats is using the loadbalance settings.
One of my colleagues suggested that logstash can be clustered, i.e. have two clusters with six instances of logstash each, and the logstash instances within each cluster all talk to each other rather than filebeat to each individual logstash instance (the only information I found online regarding clustering logstash was it doesn't support it - but that was from 2017, I did not find anything else on that topic from this year). While that looks good on paper, how to actually build and implement it doesn't seem to exist anywhere that I can find. Even your own documentation shows multiple instances of logstash but there's no details anywhere (that I can find at least) on how to actually configure them.
So I guess what I am really looking for is the logstash side of setting up up filebeat for loadbalancing in a docker environment.
Thanks for the response - if there's some documentation or an actual deployment of a loadbalanced filebeat point to multiple logstash instances you can point me to that will help I'd appreciate it.
No special configuration is needed on the logstash side, if you deploy multiple instances, usually you want them to have just the same configuration, they don't need any special configuration to be used with Beats load balancing settings.
Usually you don't have multiple logstash instances in the same host, to scale up you deploy more servers with the same configuration and add them to the hosts lists in your clients. If you see that a logstash instance could be using more CPU you can increase the number of workers with the pipeline.workers option.
There is no logstash "clustering". What you can have is multiple groups of logstash servers if they have different use cases, in this case you may have different configurations, one for each use case, but all instances in the same group will still have the same configuration. You can also have more complicated architectures with multiple levels of logstash instances, but this is usually not needed to scale.
I don't think I am...? Here's my docker-compose file (note: Due to the 7,000 character limit on posts, I only show the first 4 instances of logstash here):
All 12 instances are using the same master.conf file (that's where the input section I already posted is from).
It's now working like this, I just wanted to know if I am doing this the really convoluted hard way or is there something I missed that would simplify the whole thing.
Oh, I see, even if you open 12 ports in all containers you only expose one by one. You could still have only one beats input with the default configuration and change the ports forwarding configuration in the docker-compose file, so in logstash1 you have ports: ["5044:5044"], in logstash2 ports: ["5045:5044"] and so on.
But indeed this looks a bit convoluted I still wonder why you are trying to start so many logstash instances in the same machine. It should be only needed if you have slow outputs or some slow filters (using grok for example). In general to scale you could start by trying to increase the number of workers and only if this can still not handle all the load then try to start more instances.
I have only been using ELK for about the last five months or so. I was tasked with bringing up an ELK stack for an internal project at work, and until then I'd never heard of Elasticsearch. So the 'convoluted' method is due to my lack of experience with the product. Add to that the fact that I only started using X-Pack last week to monitor the stack, and I am still working through interpreting what all the graphs and charts.
I am at a point in the development loop where I'm trying to improve performance, and that's also a new topic for me. Your advice to increase the workers rather than adding instances is well taken - that will be my focus now. From what I've read there's a lot of ways to slice and dice performance settings - hopefully I'll be able to get something deployed soon that will work for our teams.
Thank you very much for your help on this - I really appreciate it.
My next task is to open a new forum topic to help me figure out why I am seeing _rubyexception errors. So much to learn!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.