As my workflows are a little bit complicate, since
- I need to parse together different lines (events)
- lines are non consecutive
- lines do not have any common label or tag
I implemented the filtering using Ruby plugin.
In the Ruby plugin, I do things like this (in ruby, but 'a la python'):
if event.get( "interesting_field1" ) # --- we record in memory the first field we want --- $field1 = event.get( "interesting_field1" ) event.cancel elsif event.get( "interesting_field2" ) # --- we record in memory the second field we want --- $field2 = event.get( "interesting_field2" ) event.cancel elsif event.get( "final_field" ) # --- we now release the fields we want --- event.set( "field1", $field1 ) event.set( "field2", $field2 ) else event.cancel end
Unless I misread results, I had to run Logstash with workers=1. Otherwise, each worker would parse a random selection of events, and I couldn't correlate them as I need.
First question: is what I just said correct? Or is still possible to get the same results with multiple workers?
If yes, does that increase performance?
- I will be parsing with Logstash data from multiple hosts (each one sending output with Filebeat).
- So the "hostname" is also another field in each event.
- I only need to analyze together events coming from the same host (of course, the ruby code is more complex, as I record values in Hashes, not just simple variables).
So I was wondering if there is a way to run multiple workers, forcing each worker to parse only the events from a single input host.
In other words, select the worker based on the value of field "hostname". Is that possible?
Or it would more efficient -in terms of CPU usage and memory consumption- to run one Logstash instance per core, each one listening at a different port, and configure each Filebeat to send its output to one of these ports?
Any comment is more than welcome.