Filebeat (5.0.0Alpha4) and filter

Our present setup is:

FileBeat (5.0.0 Alpha4) -> Kafka -> Logstash -> ElasticSearch

We were looking into the option of filtering (dropping) un-needed event logs at the source that is using FileBeat.

Our typical Log line (celery logs) that we would like to drop look like:

{"relativeCreated": 8381439.963102341, "process": 6651, "@timestamp": "2016-08-04T20:41:52.197Z", "args": {"exc": "Retry in 60s", "id": "57d51895-aab5-4662-b458-1b068305836f", "name": "XXXXXXXXXX"}, "module": "job", "funcName": "on_retry", "message": "Task XXXXXXXXXX[57d51895-aab5-4662-b458-1b068305836f] retry: Retry in 60s", "name": "celery.worker.job", "thread": 139742007183168, "created": 1470343312.197371, "threadName": "MainThread", "msecs": 197.3710060119629, "filename": "job.py", "levelno": 20, "processName": "MainProcess", "source_host": "worker-XXXXXXXXXX", "pathname": "XXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py", "lineno": 415, "@version": 1, "levelname": "INFO"}

The Filter in filebeat.yml (in reduced form), is

### Filters
filters:
  - drop_event:
      contains:
          message: "Retry"

The filebeat log in debug shows:

2016-08-04T20:42:08Z DBG  filters: drop_event, condition=contains: map[message:Retry]

2016-08-04T20:42:13Z WARN unexpected type *string in contains condition as it accepts only strings.

2016-08-04T20:42:13Z DBG  Publish: {
  "@timestamp": "2016-08-04T20:42:08.510Z",
  "beat": {
    "hostname": "worker-XXXXXXXXXX",
    "name": "worker-XXXXXXXXXX"
  },
  "input_type": "log",
  "message": "{\"relativeCreated\": 8381439.963102341, \"process\": 6651, \"@timestamp\": \"2016-08-04T20:41:52.197Z\", \"args\": {\"exc\": \"Retry in 60s\", \"id\": \"57d51895-aab5-4662-b458-1b068305836f\", \"name\": \"XXXXXXXXXX\"}, \"module\": \"job\", \"funcName\": \"on_retry\", \"message\": \"Task XXXXXXXXXX[57d51895-aab5-4662-b458-1b068305836f] retry: Retry in 60s\", \"name\": \"celery.worker.job\", \"thread\": 139742007183168, \"created\": 1470343312.197371, \"threadName\": \"MainThread\", \"msecs\": 197.3710060119629, \"filename\": \"job.py\", \"levelno\": 20, \"processName\": \"MainProcess\", \"source_host\": \"worker-XXXXXXXXXX\", \"pathname\": \"XXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py\", \"lineno\": 415, \"@version\": 1, \"levelname\": \"INFO\"}",
  "offset": 24647014,
  "role": "worker",
  "source": "XXXXXXXXXX/logs/celery_supervisor.log",
  "type": "workerlog"
}

I have tried to use various combination of "contains" condition and have found that either

  • the event is published, which actually should have been dropped,
    OR
  • all events/log lines are dropped even log lines that dont have the mentioned condition

I dont know if we are missing something or doing it all wrong.

Seems like there is some sort of bug in reading the filter configuration. Would you mind checking to see if this is a problem in the snapshot build (it will be released as alpha5 fairly soon). The config is changing a bit. See Filtering and Enhancing the Exported Data | Filebeat Reference [5.0] | Elastic

In alpha5 the config will look like this:

processors:
 - drop_event:
     when:
        contains:
           message: "Retry"

I upgraded the filebeat to filebeat 5.0.0Aplha5 and changed the configuration. But the effect is still the same, the messages with Retry are still published

From the debug log

2016-08-04T23:23:03Z INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2016-08-04T23:23:03Z INFO Setup Beat: filebeat; Version: 5.0.0-alpha5
2016-08-04T23:23:03Z DBG  New condition contains: map[message:Retry]
2016-08-04T23:23:03Z DBG  Processors: drop_event, condition=contains: map[message:Retry]

2016-08-04T23:23:03Z WARN unexpected type *string in contains condition as it accepts only strings.
2016-08-04T23:23:03Z DBG  Publish: {
  "@timestamp": "2016-08-04T23:23:03.297Z",
  "beat": {
    "hostname": "worker-XXXXXXXXXXX",
    "name": "worker-XXXXXXXXXXX"
  },
  "input_type": "log",
  "message": "{\"relativeCreated\": 17917357.42020607, \"process\": 6651, \"@timestamp\": \"2016-08-04T23:20:48.114Z\", \"args\": {\"exc\": \"Retry in 60s\", \"id\": \"48bd61be-1b94-415f-8f4f-94ed1c0a463b\", \"name\": \"XXXXXXXXXXX\"}, \"module\": \"job\", \"funcName\": \"on_retry\", \"message\": \"Task XXXXXXXXXXX[48bd61be-1b94-415f-8f4f-94ed1c0a463b] retry: Retry in 60s\", \"name\": \"celery.worker.job\", \"thread\": 139742007183168, \"created\": 1470352848.114828, \"threadName\": \"MainThread\", \"msecs\": 114.82810974121094, \"filename\": \"job.py\", \"levelno\": 20, \"processName\": \"MainProcess\", \"source_host\": \"worker-XXXXXXXXXXX\", \"pathname\": \"XXXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py\", \"lineno\": 415, \"@version\": 1, \"levelname\": \"INFO\"}",
  "offset": 98172086,
  "role": "worker",
  "source": "XXXXXXXXXXX/logs/celery_supervisor.log",
  "type": "workerlog"
}

Thanks for testing. Looks like a bug. Please open an issue in the elastic/beats repo and we'll investigate it on Monday.

@andrewkroh Thanks for such an awesome opensource product, the least we can do is test it and report the bugs

Issue has been opened: https://github.com/elastic/beats/issues/2178

@logstash_user Could you reproduce unexpected type *string in contains condition as it accepts only strings. error with the filebeat built from master branch?

I can reproduce that error with snapshot build version, but can not reproduce it with the version built with latest code.

This topic was automatically closed after 21 days. New replies are no longer allowed.