As part of upgrading to 2.x, I'm reevaluating what logs I ship, and how I filter them. I'll be working on figuring this all out on my own, of course. But if someone has already figured this out, and can quickly share it, that'd be appreciated.
What I need to know:
- First, is it worth logging them?
- Second, how should I filter them into useful information.
- This is especially important for logs that don't have normal timestamps.
The log files I'm looking into right now, pulled from a couple Ubuntu 14.04 vms:
-
/var/log/dmesg
-
From http://unix.stackexchange.com/questions/35851/whats-the-difference-of-dmesg-output-and-var-log-messages it sounds like dmesg is also put into syslog. So there would not be any benefit in shipping them. Right?
-
I see no timestamps in the logs.
-
It looks like at least some of the logs are multiline. How would you differentiate between different messages?
-
/var/log/boot.log
-
Is it worth it?
-
I could see how being able to see what messages were logged when the OS booted could be useful. But there's no timestamp. So, would it make the most sense just to capture everything in one es doc, with the date of when the doc was saved?
-
/var/log/btmp
-
According one blog post, it contains info about latest failed logins. On the server I'm looking at, it is empty. So, how would I filter it? I'll check other servers soon.
-
/var/log/ConsoleKit/history
-
Looks like it's information about ssh or other kinds of login sessions. Each line appears to start with unix timestamps, right?
-
How would you filter this? Looks like there could be a way to use the kv plugin. Hmm...
-
/var/log/kern.log
-
It looks like it's formated the same as syslog. Grepping one of the messages did not find the message in syslog. So, I should ship that log, right?
-
Would you do anything more than a generic grok with the syslog pattern?
-
/var/log/upstart/*
-
Contains a whole bunch of gz files. They look like they're just zips of some of the actual log files in /var/log. Which doesn't make sense. 'Course I am likely not reading them correctly. I don't think it's worth it to figure out how to ship them, but is it?
-
If it is worth it, how would I do that? gz is binary. Can Filebeat read that?
Um, has anyone published a post that gives us some decent filters for most of the popular log files? And maybe some general advice on how to develop filters that are useful in Kibana. I can always just log each line and run the date filter to normalize the timestamp, but there's more to it than that.
For example, a /var/log/apt/history.log entry looks like:
Start-Date: 2016-01-15 01:32:09
Upgrade: openssh-server:amd64 (6.6p1-2ubuntu2.3, 6.6p1-2ubuntu2.4), openssh-sftp-server:amd64 (6.6p1-2ubuntu2.3, 6.6p1-2ubuntu2.4), openssh-client:amd64 (6.6p1-2ubuntu2.3, 6.6p1-2ubuntu2.4)
End-Date: 2016-01-15 01:32:11
Would I want to use the Start-Date or the End-Date for the timestamp in ES? Should I split up the list of upgraded packages? Would it be useful to have something like (not in any kind of official syntax, but you should get the idea):
timestamp: <Start-Date: 2016-01-15 01:32:09>
fields:
start-date: 2016-01-15 01:32:09
end-date: 2016-01-15 01:32:11
operation-type: Upgrade
packages:
- name: openssh-server
architechture: amd64
old-version: 6.6p1-2ubuntu2.3
new-version: 6.6p1-2ubuntu2.4
- name: openssh-sftp-server
architechture: amd64
old-version: 6.6p1-2ubuntu2.3
new-version: 6.6p1-2ubuntu2.4
...
How would I get that kind of output from logstash? What filters/codecs should I use? What is the benefit of putting out the effort to make this happen? How can I optimize the fields for indexing and storage? Etc.
Anyway, this is a bit long now... I'll appreciate any feedback I can get. I know I haven't yet researched everything myself. But writing this has clarified things a lot for me, and I think some of it would be valuable conversation starter.