Our present setup is:
FileBeat (5.0.0 Alpha4) -> Kafka -> Logstash -> Elasticsearch
We were looking into the option of filtering (dropping) un-needed event logs at the source that is using FileBeat.
Our typical Log line (celery logs) that we would like to drop look like:
{"relativeCreated": 8381439.963102341, "process": 6651, "@timestamp": "2016-08-04T20:41:52.197Z", "args": {"exc": "Retry in 60s", "id": "57d51895-aab5-4662-b458-1b068305836f", "name": "XXXXXXXXXX"}, "module": "job", "funcName": "on_retry", "message": "Task XXXXXXXXXX[57d51895-aab5-4662-b458-1b068305836f] retry: Retry in 60s", "name": "celery.worker.job", "thread": 139742007183168, "created": 1470343312.197371, "threadName": "MainThread", "msecs": 197.3710060119629, "filename": "job.py", "levelno": 20, "processName": "MainProcess", "source_host": "worker-XXXXXXXXXX", "pathname": "XXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py", "lineno": 415, "@version": 1, "levelname": "INFO"}
The Filter in filebeat.yml (in reduced form), is
### Filters
filters:
- drop_event:
contains:
message: "Retry"
The filebeat log in debug shows:
2016-08-04T20:42:08Z DBG filters: drop_event, condition=contains: map[message:Retry]
2016-08-04T20:42:13Z WARN unexpected type *string in contains condition as it accepts only strings.
2016-08-04T20:42:13Z DBG Publish: {
"@timestamp": "2016-08-04T20:42:08.510Z",
"beat": {
"hostname": "worker-XXXXXXXXXX",
"name": "worker-XXXXXXXXXX"
},
"input_type": "log",
"message": "{\"relativeCreated\": 8381439.963102341, \"process\": 6651, \"@timestamp\": \"2016-08-04T20:41:52.197Z\", \"args\": {\"exc\": \"Retry in 60s\", \"id\": \"57d51895-aab5-4662-b458-1b068305836f\", \"name\": \"XXXXXXXXXX\"}, \"module\": \"job\", \"funcName\": \"on_retry\", \"message\": \"Task XXXXXXXXXX[57d51895-aab5-4662-b458-1b068305836f] retry: Retry in 60s\", \"name\": \"celery.worker.job\", \"thread\": 139742007183168, \"created\": 1470343312.197371, \"threadName\": \"MainThread\", \"msecs\": 197.3710060119629, \"filename\": \"job.py\", \"levelno\": 20, \"processName\": \"MainProcess\", \"source_host\": \"worker-XXXXXXXXXX\", \"pathname\": \"XXXXXXXXXX/venv/local/lib/python2.7/site-packages/celery/worker/job.py\", \"lineno\": 415, \"@version\": 1, \"levelname\": \"INFO\"}",
"offset": 24647014,
"role": "worker",
"source": "XXXXXXXXXX/logs/celery_supervisor.log",
"type": "workerlog"
}
I have tried to use various combination of "contains" condition and have found that either
- the event is published, which actually should have been dropped,
OR - all events/log lines are dropped even log lines that dont have the mentioned condition
I dont know if we are missing something or doing it all wrong.