I am new to this so please bare with me as I need to take small steps to undertand this. I am trying to parse PFSense 2.4.4 logs into the Elastic Stack 6.5 which are sent via syslog. I am getting a bit confused with some logs and here are the questions I have this far:
Even though the syslog events seem to have the fields separated in the right mappings I still see the message field still packs a lump of field data separated by columns, which I would assume only happens when there's parsing errors, or is this perhaps a feature for keeping the raw event even if the log gets parsed correctly?
In the tags field of the events I see: _grokparsefailure, PFSense, firewall, GeoIP. Does that mean that there is a parsing error in regards to the grok failure part? How about the rest of the tags? Or does that mean those are the available tags?
Why is geoip only running on the public ip address of my router? I think it would be more usefull to me to have that information on the external ip addresses despite the traffic direction? Can this be changed and if so, how?
The message field will remain until you issue a "remove_field" command - you are correct. When parsing data it will do nothing to the original field, assuming you do not overwrite it.
_grokparsefailure means that your Grok filter is not fully parsing the log line correctly. You may find many of the fields are correct and it might just be failing on a little bit. The tags are also set somewhere else in your input or filter.
GeoIP runs on whatever you give it as the input. So if it is running on the public address of your router, that means you have told GeoIP to run on the public address of your router
@Eniqmatic. Thanks for the clarification. I have been doing some work on this in the past few days and, after more than a few re-builds, I think I have got a baseline stack working well enough for me to play with. Now for customization, thus more questions:
How to troubleshoot Grok parsing issues? Any tools and techniques?
My Logstash configuration files for PFSense looks like this:
Would a Boolean operator like: **or "dest_ip"** added to the config along side the source field on the same line on the above screenshot get both source and destination ip address geolocations? Or should I add this or something else elsewhere to get them both?
Is there some official documentation on how to configure and troubleshoot any of the components of the stack? I'm looking online on how to get things done and there is quite a lot of it, but I could not find a reason why numbers are used on the Logstash configuration files so often on tutorials, i.e. 01-inputs.conf, etc,. Does the system actually use those to line up with its respective filter and or output configuration files, or is it just for the admin to keep track of things?
There are a few tools for parsing, there is one built into the dev tools in Kibana (Dev tools > "Grok Debugger" at the top) where you can paste your config and it will tell you if it parsed or not, and there is one here: http://grokconstructor.appspot.com/do/match
There is no point running a GeoIP on both because the only one of the addresses are going to be a public address. So I would run on one of them using an if filter as you say:
if source ip == "internal/private IP address" {
do nothing
} else {
geoip { }
}
And repeat for your destination address. The code would look something like:
About the geoip thing, let's say I am trying to identify where my systems are connecting to, country wise, and thus outgoing traffic, I would still find it interesting to have the geoip information on that. So with an "if" statement as the one you've wrote above to avoid having the geoip feature doing lookups on private ip addresses, but doing those lookups on source and destination public ip addresses, wheter source or destination.
How could I configure/write a filter which would do this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.