Most tutorials simply used the built in filters and leave it at that, but I am trying to parse an LDAP log and running into a few issues. The following post almost matches my issue, but need more clarification in general as I am totally new.
Here is an example of one line of the log, although there about six other types of events that are logged:
[28/Sep/2018:14:08:43.893180585 -0500] conn=12345678 fd=77 slot=77 SSL connection from 10.10.10.10 to 192.192.192.192
So I initially sent this log over using the Logstash built in syslog filter and it sort of worked. Somehow it derived the timestamp from the first block of text. My filter starts with this:
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}
So this appears to work, but I am not sure how. This does not work using the GrokDebug site so not sure. Also do the square brackets not cause problems with the parsing?
How does Grok handle delineations? Here they are spaces, some logs are commas, others may be tabs. In this case it seemed to just know that spaces separate the fields.
So I can create a field for each section of the line, for example, conn=%{INT:conn} fd=%{INT:fd} slot=%{INT:slot}. That works great. BUT what if I don't want to parse the fd field for example. Can I somehow tell Grok to skip a section and start parsing at the next section?
I know this goes into a lot of filtering basics but I can't seem to find this explained in detail anywhere.
Neither am I. SYSLOGTIMESTAMP shouldn't work with the example given.
How does Grok handle delineations? Here they are spaces, some logs are commas, others may be tabs. In this case it seemed to just know that spaces separate the fields.
Grok expressions are really just regular expressions, and they need to match exactly.
So I can create a field for each section of the line, for example, conn=%{INT:conn} fd=%{INT:fd} slot=%{INT:slot}. That works great. BUT what if I don't want to parse the fd field for example. Can I somehow tell Grok to skip a section and start parsing at the next section?
The ?operator in regular expressions means "zero or one occurrence of the preceding token", which together with parentheses allows you to specify that a sequence of token is optional. Your example could be modified to
conn=%{INT:conn} (fd=%{INT:fd} )?slot=%{INT:slot}
to make the fd field optional. Note that the kv filter is usually the best option for parsing key/value pairs.
But wait. If you actually meant "the fd=X stuff is always there, I just don't care about capturing it into a field" you're actually looking for this:
Thanks Magnus, I am slowly getting my mind in the correct context to digest this stuff. Yes, it was the second option of always there but I don't care about capturing. Very helpful stuff.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.