Filebeat (5.0.0Alpha4) and filter


(Aserdp) #1

Our present setup is:

FileBeat (5.0.0 Alpha4) -> Kafka -> Logstash -> ElasticSearch

We were looking into the option of filtering (dropping) un-needed event logs at the source that is using FileBeat.

Our typical Log line (celery logs) that we would like to drop look like:

{"relativeCreated": 8381439.963102341, "process": 6651, "@timestamp": "2016-08-04T20:41:52.197Z", "args": {"exc": "Retry in 60s", "id": "57d51895-aab5-4662-b458-1b068305836f", "name": "XXXXXXXXXX"}, "module": "job", "funcName": "on_retry", "message": "Task XXXXXXXXXX[57d51895-aab5-4662-b458-1b068305836f] retry: Retry in 60s", "name": "celery.worker.job", "thread": 139742007183168, "created": 1470343312.197371, "threadName": "MainThread", "msecs": 197.3710060119629, "filename": "job.py", "levelno": 20, "processName": "MainProcess", "source_host": "worker-XXXXXXXXXX", "pathname": "XXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py", "lineno": 415, "@version": 1, "levelname": "INFO"}

The Filter in filebeat.yml (in reduced form), is

### Filters
filters:
  - drop_event:
      contains:
          message: "Retry"

The filebeat log in debug shows:

2016-08-04T20:42:08Z DBG  filters: drop_event, condition=contains: map[message:Retry]

2016-08-04T20:42:13Z WARN unexpected type *string in contains condition as it accepts only strings.

2016-08-04T20:42:13Z DBG  Publish: {
  "@timestamp": "2016-08-04T20:42:08.510Z",
  "beat": {
    "hostname": "worker-XXXXXXXXXX",
    "name": "worker-XXXXXXXXXX"
  },
  "input_type": "log",
  "message": "{\"relativeCreated\": 8381439.963102341, \"process\": 6651, \"@timestamp\": \"2016-08-04T20:41:52.197Z\", \"args\": {\"exc\": \"Retry in 60s\", \"id\": \"57d51895-aab5-4662-b458-1b068305836f\", \"name\": \"XXXXXXXXXX\"}, \"module\": \"job\", \"funcName\": \"on_retry\", \"message\": \"Task XXXXXXXXXX[57d51895-aab5-4662-b458-1b068305836f] retry: Retry in 60s\", \"name\": \"celery.worker.job\", \"thread\": 139742007183168, \"created\": 1470343312.197371, \"threadName\": \"MainThread\", \"msecs\": 197.3710060119629, \"filename\": \"job.py\", \"levelno\": 20, \"processName\": \"MainProcess\", \"source_host\": \"worker-XXXXXXXXXX\", \"pathname\": \"XXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py\", \"lineno\": 415, \"@version\": 1, \"levelname\": \"INFO\"}",
  "offset": 24647014,
  "role": "worker",
  "source": "XXXXXXXXXX/logs/celery_supervisor.log",
  "type": "workerlog"
}

I have tried to use various combination of "contains" condition and have found that either

  • the event is published, which actually should have been dropped,
    OR
  • all events/log lines are dropped even log lines that dont have the mentioned condition

I dont know if we are missing something or doing it all wrong.


(Andrew Kroh) #2

Seems like there is some sort of bug in reading the filter configuration. Would you mind checking to see if this is a problem in the snapshot build (it will be released as alpha5 fairly soon). The config is changing a bit. See https://www.elastic.co/guide/en/beats/filebeat/5.0/filtering-and-enhancing-data.html

In alpha5 the config will look like this:

processors:
 - drop_event:
     when:
        contains:
           message: "Retry"

(Aserdp) #3

I upgraded the filebeat to filebeat 5.0.0Aplha5 and changed the configuration. But the effect is still the same, the messages with Retry are still published

From the debug log

2016-08-04T23:23:03Z INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2016-08-04T23:23:03Z INFO Setup Beat: filebeat; Version: 5.0.0-alpha5
2016-08-04T23:23:03Z DBG  New condition contains: map[message:Retry]
2016-08-04T23:23:03Z DBG  Processors: drop_event, condition=contains: map[message:Retry]

2016-08-04T23:23:03Z WARN unexpected type *string in contains condition as it accepts only strings.
2016-08-04T23:23:03Z DBG  Publish: {
  "@timestamp": "2016-08-04T23:23:03.297Z",
  "beat": {
    "hostname": "worker-XXXXXXXXXXX",
    "name": "worker-XXXXXXXXXXX"
  },
  "input_type": "log",
  "message": "{\"relativeCreated\": 17917357.42020607, \"process\": 6651, \"@timestamp\": \"2016-08-04T23:20:48.114Z\", \"args\": {\"exc\": \"Retry in 60s\", \"id\": \"48bd61be-1b94-415f-8f4f-94ed1c0a463b\", \"name\": \"XXXXXXXXXXX\"}, \"module\": \"job\", \"funcName\": \"on_retry\", \"message\": \"Task XXXXXXXXXXX[48bd61be-1b94-415f-8f4f-94ed1c0a463b] retry: Retry in 60s\", \"name\": \"celery.worker.job\", \"thread\": 139742007183168, \"created\": 1470352848.114828, \"threadName\": \"MainThread\", \"msecs\": 114.82810974121094, \"filename\": \"job.py\", \"levelno\": 20, \"processName\": \"MainProcess\", \"source_host\": \"worker-XXXXXXXXXXX\", \"pathname\": \"XXXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py\", \"lineno\": 415, \"@version\": 1, \"levelname\": \"INFO\"}",
  "offset": 98172086,
  "role": "worker",
  "source": "XXXXXXXXXXX/logs/celery_supervisor.log",
  "type": "workerlog"
}

(Andrew Kroh) #4

Thanks for testing. Looks like a bug. Please open an issue in the elastic/beats repo and we'll investigate it on Monday.


(Aserdp) #5

@andrewkroh Thanks for such an awesome opensource product, the least we can do is test it and report the bugs

Issue has been opened: https://github.com/elastic/beats/issues/2178


(Spacewander) #6

@logstash_user Could you reproduce unexpected type *string in contains condition as it accepts only strings. error with the filebeat built from master branch?

I can reproduce that error with snapshot build version, but can not reproduce it with the version built with latest code.


(system) #7

This topic was automatically closed after 21 days. New replies are no longer allowed.