I have a setup with lots of SNMP Traps coming in which I started with Logstash 1.5 . So I set up a "Logstash Shipper" (just inputs and one output to two Redis servers, no filters) and "Logstash Indexer" another instance with inputs to get data from the Redis servers, all filters and an elasticsearch output.
This works even during bursts of traps because the first Logstash instance is tremendously fast and Redis can take whatever they are sending.
Can I replace this setup with a single Logstash 6 instance with 2 pipelines? One with all the configuration of the shipper, the other one of the indexer? According to the documentation they should not interfere with each other. So I think it would theoretically work as expected. What I'm asking here is if I forget something or if experience tells that it doesn't work out as expected.
This will work. I have had Logstash 6.x running with as many as 43 different pipelines within a single Logstash instance. In this scenario redis is used for "intra-pipeline communication", and careful tagging controls how events are routed to the necessary pipelines based messages required processing.
The only thing I would caution you about is that when you have everything in a single instance, including collection, if you have to restart logstash you also lose collection for the time it takes logstash to restart. IMO one of the main benefits of using redis immediately after raw collection is that you can restart your "processing instances" and the raw data will be sitting in redis waiting for them when they become available again. You will lose this benefit if you run everything in a single instance.
BTW, I would also recommend setting your collection pipeline to use a persistent queue. That way if redis becomes unavailable data will go to the persistent queue until redis is available again.
Thanks, @rcowart , I'll give it a try. And thank you for pointing out the downtime issues. I assume this would not happen if I only change the configuration and Logstash does the automatic reload? The reload should happen only in the pipline I changed. Not to mention that automatic config reload should work seamlessly after all.
I'm not that much a fan of the persisted queue. I always thought that having events fly through memory only until Redis persists them to disk is the best way not to lose any. Using persisted queue should make the I/O on the host explode and make the overall performance dependant from the disks performance. I was going to test the performance impact of persisted queue and maybe I'll take your tip and implement it for the collection pipeline.
Thanks a lot.
I am also not a fan of persistent queue, but keep in mind that events will only be queued if Logstash can't forward them on, which would be the case if redis was unreachable (e.g. you need to restart your redis environment for an upgrade). So in this case persistent queue will almost never be used, and will provide no penalty for event throughput.
You are correct that auto-reload will prevent the need for a restart in many scenarios. However, what about an upgrade? Or when you need Logstash to pickup new environment variables (if you use those in your pipelines... I do a lot)?. In the end it all comes down to how much data you are willing to miss.
As far as I understand the persistent queue feature it writes every single event to disk before acknowledging it. After having it processed with all filters the event is removed from the persistent queue. So it should have a significant impact.
I never used environment variables. I tried once in a very early version but it didn't work out. I'm guilty of not having tried it again. I'll give it a new try and if I'm stuck, I'll open a new topic.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.