From the guidance of Two filebeat instances, I have successfully setup two instances of Filebeat on one machine. Instance1 reads logs for AppA and ships them to Logstash:11000 while Instance2 reads logs for AppB and ships them to Logstash:11001.
Now I'm running into the issue where my Logstash pipeline config file for port 11000 is also receiving the events sent to port 11001. Why is this happening? FWIW, Filebeat Instance2 was failing to successfully send events to Logstash:11001 until I added the pipeline config file for port 11001
AppA filebeat.yml:
output.logstash:
# The Logstash hosts
hosts: ["logstash:11000"]
AppA basic pipeline config:
input {
beats {
id => "appa_beats"
client_inactivity_timeout => 1200
port => 11000
}
}
AppB filebeat.yml:
output.logstash:
# The Logstash hosts
hosts: ["logstash:11001"]
AppB basic pipeline config:
input {
beats {
id => "appb_beats"
client_inactivity_timeout => 1200
port => 11001
}
}
If I had to guess, and since you haven't shown us your pipeline config I do have to guess, then I would say that you are pointing one pipeline to a directory that contains both configurations. What does your pipeline configuration file look like?
Yes, I'm only running one pipeline. Multiple pipelines is a new 6+ feature that I haven't looked into much; particularly since I'm unsure why I need more than one.
Are you saying that the beats input plugin does not actually filter on the port even though the port is a required parameter? As such, I'm forced to run them in separate pipelines?
Unless you're using Logstash 6 multiple pipelines, all your pipeline configurations are essentially merged into one.
In your example events will be received by both inputs (ports 11000 & 11001), and output to both logstash:11000 & logstash:11001. The alternative to using multiple pipelines is to tag events on your inputs, and then wrap the outputs in conditionals:
i.e.
Input:
input {
beats {
id => "appa_beats"
client_inactivity_timeout => 1200
port => 11000
tags => ["appa"]
}
}
or
input {
beats {
id => "appa_beats"
client_inactivity_timeout => 1200
port => 11000
type => "appa"
}
}
output {
if "appa" in [tags] {
logstash {
# The Logstash hosts
hosts: ["logstash:11000"]
}
}
}
Ultimately though I'd definitely suggest looking at using multiple pipelines. It'll help keep your configuration cleaner as there's less need for confusing conditionals, and also limits the blast radius on outputs being down, filter crashes etc.
Well, that's a bit counter-intuitive particularly since you have to specify the port you want to listen on and ideally an id for the input. If I'm specifying that deferiental information, surely the pipeline could respect it
Thanks for clearing up my confusion since it's not mentioned in the docs that clearly.
Well, that's a bit counter-intuitive particularly since you have to specify the port you want to listen on and ideally an id for the input.
You're jumping to conclusions. What does the fact that you specify a port and an id for an input have to do with the way multiple configuration files are dealt with?
This was my understanding after reading various points of the documentation:
If I define beats { id: A, port 0 } in one file and beats { id: B, port 1 } in a different file; they would get merged into 2 separate instances of the beats plugin as I defined them to be different.
Whereas, if I define beats { id: A, port 0 } in one file and beats { id: A, port 0 } in a different file; they would get merged into 1 single instance of the beats plugin as I defined them to be the same.
I would argue that Logstash is the first program that I've encountered that co-mingles data read from one port with data read from a different port; hence my original confusion.
shrugs Maybe it's just me; however, I would say that the documentation is muddy at best particularly since there doesn't seem to be a page dedicated to explaining the configuration merge process.
Finally, what happens when I define beats { id: A, port 0} in one file and http { id: B, port 1} in a different file? My expectation would be that they would use their own filters / outputs defined in each of their files; but, I'm starting to think that you'll say that even their input data would be co-mingled.
Wow... no wonder multiple pipelines were added to Logstash 6+. You are forced to use them (or seperate Logstash instances) every time you want to log a different style of event.
If I define beats { id: A, port 0 } in one file and beats { id: B, port 1 } in a different file; they would get merged into 2 separate instances of the beats plugin as I defined them to be different.
That's correct.
Whereas, if I define beats { id: A, port 0 } in one file and beats { id: A, port 0 } in a different file; they would get merged into 1 single instance of the beats plugin as I defined them to be the same.
No merging takes place. You'd get an error when the second input starts up since the listening port would already be busy.
I would argue that Logstash is the first program that I've encountered that co-mingles data read from one port with data read from a different port; hence my original confusion.
rsyslogd and Postfix are two counterexamples that spring to mind.
Maybe it's just me; however, I would say that the documentation is muddy at best particularly since there doesn't seem to be a page dedicated to explaining the configuration merge process.
No, this is something that should be explicitly described since it's a common misconception. I don't know why people have the preconception that each file in a Logstash dropfolder is independent; I can't think of another piece of software whose dropfolder support is anything but a convenient way of splitting big configuration files into smaller ones.
Wow... no wonder multiple pipelines were added to Logstash 6+. You are forced to use them (or seperate Logstash instances) every time you want to log a different style of event.
No you're not. You can use conditionals to select which filters and outputs to apply. The opposite situation (one file == one pipeline) would be much worse since it would require configuration duplication (or some kind of scripted or templated configuration file merging done outside of Logstash) if you wanted to apply the same filters and outputs for inputs from different sources.
It is possible that my preference towards Windows and C# have precluded me from being exposed to the annoyance of port data mixing.
I was aware that the files got combined. What I was not aware of was that there was no direct association between the inputs, filters, and outputs defined in the same file. Tbh, it would actually be much more straight forward if you could not have all three in the same file and instead there was something like: inputs.conf, filters.conf, outputs.conf
While it's still a bit messy and not that SOLID, the above is still mostly managable when all of your log events have a similar structure and/or sent to the input. However, adding different looking logs sent to a different port becomes quite unwieldy without using a second pipeline.
Personally, I think Logstash would be easier to use / config out of the box if it maintained the direct association of input ports / filters / outputs defined in each file. In that way, you could have for example:
FileA process IIS logs
FileB processes application-generated logs for AppA sent via Filebeat
FileC processes application-generated logs for AppB sent via Filebeat
without having to maintain the brain power to remember that you need to wrap everything in conditionals verifying that you are indeed working with the expected event type.
Now that I know that Logstash doesn't, it's not super complex in v6+ to isolate each input port to its own pipeline. Although, it's still a bit annoying that I'm forced to incur the extra overhead of running another instance and/or pipeline just to ensure that the data sent to each port is correctly associated with the applicable filters / outputs.
Finally, https://pastebin.com/VJsf63Gq is the real world config I'm currently using to process events sent to port 11000. As you can see, it would be quite cumbersome to add even more conditional wrappers to verify that I am indeed working with events sent to port 11000 as I expect.
Personally, I think Logstash would be easier to use / config out of the box if it maintained the direct association of input ports / filters / outputs defined in each file.
As I noted earlier I disagree since it's much more onerous to work around if you want to share filters and outputs.
As you can see, it would be quite cumbersome to add even more conditional wrappers to verify that I am indeed working with events sent to port 11000 as I expect.
In practice you'd have one file with filters per log type, each having a single if log type is this or that conditional at the top of the file that wraps all filters within.
Filters are unrelated to the input type and port. Use the inputs and possibly helper filters to classify incoming events, then apply filters according to that classification. Additionally, some filters can be agnostic to the classification itself and look at other traits of events (like perform a DNS lookup if a particular field contains an IP address).
I guess I just don't understand how you programmatically know that EventA is of TypeA or TypeB without having some notation of how the event arrived into the system.
Can you provide the code for this example? I wasn't aware that you could check the types of fields to discover that FieldA, FieldB, and FieldC are IP addresses and thus perform the generic filter on them.
I guess I just don't understand how you programmatically know that EventA is of TypeA or TypeB without having some notation of how the event arrived into the system.
Depends on what event sources you have, but strive towards letting the event source declare the kind of event it sends. For example, if you're using Filebeat use the fields option to add whatever fields you need.
Can you provide the code for this example? I wasn't aware that you could check the types of fields to discover that FieldA, FieldB, and FieldC are IP addresses and thus perform the generic filter on them.
if [somefield] =~ /^\d+\.\d+\.\d+\.\d+$/ {
dns {
...
}
}
If it looks like an IP address and quacks like an IP address it's probably an IP address. But yeah, you have to list the names of the fields to inspect in this manner. Short of using a ruby filter you can't instruct Logstash to perform something on any field whose contents matches a regexp.
At the end of the day, it comes down to a style preference.
I prefer to keep things as SOLID as possible. Meanwhile, I can't help but picture a jumbled mess of spaghetti and pity the person who has to maintain it when thinking about the config as you describe it.
On the bright side, this thread has helped me understand how Logstash pipelines operate completely than how I thought after reading the documentation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.