What logs should I ship, and how should I filter them?

As part of upgrading to 2.x, I'm reevaluating what logs I ship, and how I filter them. I'll be working on figuring this all out on my own, of course. But if someone has already figured this out, and can quickly share it, that'd be appreciated. :slight_smile:

What I need to know:

  • First, is it worth logging them?
  • Second, how should I filter them into useful information.
  • This is especially important for logs that don't have normal timestamps.

The log files I'm looking into right now, pulled from a couple Ubuntu 14.04 vms:

  • /var/log/dmesg

  • From http://unix.stackexchange.com/questions/35851/whats-the-difference-of-dmesg-output-and-var-log-messages it sounds like dmesg is also put into syslog. So there would not be any benefit in shipping them. Right?

  • I see no timestamps in the logs.

  • It looks like at least some of the logs are multiline. How would you differentiate between different messages?

  • /var/log/boot.log

  • Is it worth it?

  • I could see how being able to see what messages were logged when the OS booted could be useful. But there's no timestamp. So, would it make the most sense just to capture everything in one es doc, with the date of when the doc was saved?

  • /var/log/btmp

  • According one blog post, it contains info about latest failed logins. On the server I'm looking at, it is empty. So, how would I filter it? I'll check other servers soon.

  • /var/log/ConsoleKit/history

  • Looks like it's information about ssh or other kinds of login sessions. Each line appears to start with unix timestamps, right?

  • How would you filter this? Looks like there could be a way to use the kv plugin. Hmm...

  • /var/log/kern.log

  • It looks like it's formated the same as syslog. Grepping one of the messages did not find the message in syslog. So, I should ship that log, right?

  • Would you do anything more than a generic grok with the syslog pattern?

  • /var/log/upstart/*

  • Contains a whole bunch of gz files. They look like they're just zips of some of the actual log files in /var/log. Which doesn't make sense. 'Course I am likely not reading them correctly. I don't think it's worth it to figure out how to ship them, but is it?

  • If it is worth it, how would I do that? gz is binary. Can Filebeat read that?

Um, has anyone published a post that gives us some decent filters for most of the popular log files? And maybe some general advice on how to develop filters that are useful in Kibana. I can always just log each line and run the date filter to normalize the timestamp, but there's more to it than that.

For example, a /var/log/apt/history.log entry looks like:

Start-Date: 2016-01-15  01:32:09
Upgrade: openssh-server:amd64 (6.6p1-2ubuntu2.3, 6.6p1-2ubuntu2.4), openssh-sftp-server:amd64 (6.6p1-2ubuntu2.3, 6.6p1-2ubuntu2.4), openssh-client:amd64 (6.6p1-2ubuntu2.3, 6.6p1-2ubuntu2.4)
End-Date: 2016-01-15  01:32:11

Would I want to use the Start-Date or the End-Date for the timestamp in ES? Should I split up the list of upgraded packages? Would it be useful to have something like (not in any kind of official syntax, but you should get the idea):

timestamp: <Start-Date: 2016-01-15  01:32:09>
fields:
  start-date: 2016-01-15  01:32:09
  end-date: 2016-01-15  01:32:11
  operation-type: Upgrade
  packages:
    - name: openssh-server
      architechture: amd64
      old-version: 6.6p1-2ubuntu2.3
      new-version: 6.6p1-2ubuntu2.4
    - name: openssh-sftp-server
      architechture: amd64
      old-version: 6.6p1-2ubuntu2.3
      new-version: 6.6p1-2ubuntu2.4
    ...

How would I get that kind of output from logstash? What filters/codecs should I use? What is the benefit of putting out the effort to make this happen? How can I optimize the fields for indexing and storage? Etc.

Anyway, this is a bit long now... I'll appreciate any feedback I can get. I know I haven't yet researched everything myself. But writing this has clarified things a lot for me, and I think some of it would be valuable conversation starter. :slight_smile:

What to capture from which logs is largely a matter of preference and need, so most "should it pick up or parse this too" questions can usually be answered "it depends".

Random answers:

Looks like it's information about ssh or other kinds of login sessions. Each line appears to start with unix timestamps, right?

Yes, looks like it.

How would you filter this? Looks like there could be a way to use the kv plugin. Hmm...

Yes, I'd start with the kv filter.

/var/log/btmp

That's a binary file that you'd need custom stuff to read.

gz is binary. Can Filebeat read that?

Not yet, but see issue #637.

Regarding /var/log/apt/history.log, use a multiline codec or filter to join lines, use whichever timestamp you like as @timestamp, and consider using a ruby filter to parse the Upgrade line. You might be able to use stock filter but that could get ugly.

I agree a lot of filtering depend on the environment. But there are a lot of very common logs and log formats out there that a lot of people have analysed by now. For example, apache logs. The apachecombined grok pattern is pretty darn good at getting everything. It's not perfect, but as a start it saves a lot of time. Same for the syslogcombined pattern.

So I was hoping someone could share what they've done, since these logs are on a lot of systems. Even if everyone has to modify it a bit, that'd be a good start.

:slightly_smiling:

Thanks for your random answers. They'll help.