How to "exclude_lines but make these exceptions"?

I'm shipping syslog with custom template to log the severity. I'm using exclude_lines to not bother shipping anything with severity greater than severity info. I'd like to make an exception for a handful of processes that are logging interesting msgs at debug level.

A quick test shows I can have multiple prospectors process the same file. Is this behavior expected? I'm worried that I may be abusing it some how and this may change in the future. Details of my quick test are below if helps provide more context. Thanks!

$ cat filebeat.yml
output:
  console:
    pretty: true
filebeat:
  prospectors:
    -
      paths:
        - /tmp/test.log
      include_lines:
        - 'KEEP'
    -
      paths:
        - /tmp/test.log
      exclude_lines:
        - 'SEV:[6-9]'

$ cat test.log
NOISE SEV:6
KEEP SEV:6
NOISE SEV:7
SIGNAL SEV:4

$ rm -f .filebeat; filebeat
{
  "@timestamp": "2016-04-10T20:29:58.958Z",
  "beat": {
    "hostname": "home",
    "name": "home"
  },
  "count": 1,
  "fields": null,
  "input_type": "log",
  "message": "SIGNAL SEV:4",
  "offset": 35,
  "source": "/tmp/test.log",
  "type": "log"
}
{
  "@timestamp": "2016-04-10T20:29:58.959Z",
  "beat": {
    "hostname": "home",
    "name": "home"
  },
  "count": 1,
  "fields": null,
  "input_type": "log",
  "message": "KEEP SEV:6",
  "offset": 12,
  "source": "/tmp/test.log",
  "type": "log"
}

I would strongly recommend not to do that as it will mess with the registry file. Currently the registry file uses the path as identifier so there will be two processes modifying the same state data. I assume this will lead to some wired behaviour.

The current solution would be to start two instance of filebeat and configuring two different locations for the registry file.

Some refactoring in the registry file is hopefully happening in the near future. Part of this refactoring could make the above usage possible, in case the identifier of a harvested file is unique so the same file could be harvested by different prospectors without having conflicts in the registry file. See here for some more details: https://github.com/elastic/beats/issues/1022#issuecomment-206737447

Bummer. It's probably too late, but I wish include_lines behaved more as an exception list for exclude_lines - could support the current behavior with exclude_lines: ['.']. Thanks for the reply, though. Saved me a nasty surprise later.

Here are some more details on the exclude_lines implementation: https://github.com/elastic/beats/pull/430

Not sure i understand what you mean with ['.']

If willing to entertain the idea of making include_lines behave more like an exception list for exclude_lines, then users could use exclude_lines: ['.'] along with their current include_lines to achieve the current behavior. I'm throwing that out there as an option to avoid adding another option (eg exclude_lines_exceptions).

It's odd to me since before include_lines was available, all lines were already included. Someone interested in using include_lines is really interested in excluding. Then the question becomes how does include_lines interact with exclude_lines. I wish it was more like:


if len(h.ExcludeLinesRegexp) > 0 {
	if MatchAnyRegexps(h.ExcludeLinesRegexp, line) {
		if len(h.IncludeLinesRegexp) > 0 {
			if !MatchAnyRegexps(h.IncludeLinesRegexp, line) {
				logp.Debug("harvester", "Drop line as it matches an exclude pattern and does not match any of the include patterns %s", line)
				return false
			}
		}
		else {
			logp.Debug("harvester", "Drop line as it matches an exclude pattern and no include patterns defined %s", line)
			return false
		}
	}
}
return true

I can't find any other discussion about this, so I'm probably just being weird - feel free to ignore :slight_smile:

As @monica is working on this it probably makes sense that she chimes in here.

The way is implemented at the moment is that include_lines is called before exclude_lines. The idea was to be able to include Apache logs but exclude the debug messages coming from Apache by setting:

include_lines: ['apache']
exclude_lines: ['^DBG']

We are currently working on a generic solution for filtering that would be part of libbeat and available for all the Beats where you can easily choose the fields to export or the events to drop based on a condition.

Your use case is interesting and we will consider it when implementing generic filtering.

Where is that conversation happening? I'm subscribed to this issue but there hasn't been much activity lately. Thanks!