While using filebeat, I encountered an odd issue where two servers configured with the same way behaved very differently. One of them works perfectly fine and the other one works at a very slow pace. So I know the problem lies somewhere in the infrastructure and not on filebeat.
While comparing the slow and fast server, I noticed differences in the following metrics:
monitoring.metrics.libbeat.output.events.active : 4000 on the slow server, sub 10 on the fast one.
monitoring.metrics.libbeat.output.events.failed : 4000 on the slow server, zero on the fast one.
To troubleshoot this issue, I turned on all the debug logs for a few minutes by running :
filebeat -e -d "*"
Reading through the entire debug logs, I couldn't find any clue to tell me why some events reached the failed state.
Hence my questions :
- Did I miss a configuration flag to enable Kafka output debug logs ?
- If not, should Kafka output generate logs when messages reach the failed status ?
Here is the full metrics output of the slow server :
{
"monitoring": {
"metrics": {
"beat": {
"cgroup": {
"memory": {
"mem": {
"usage": {
"bytes": 139218944
}
}
}
},
"cpu": {
"system": {
"ticks": 590,
"time": {
"ms": 36
}
},
"total": {
"ticks": 3380,
"time": {
"ms": 233
},
"value": 3380
},
"user": {
"ticks": 2790,
"time": {
"ms": 197
}
}
},
"handles": {
"limit": {
"hard": 1000000,
"soft": 1000000
},
"open": 21
},
"info": {
"ephemeral_id": "0914acdc-9743-41ef-9819-bed28d68f8b3",
"uptime": {
"ms": 540080
},
"version": "7.14.1"
},
"memstats": {
"gc_next": 66830880,
"memory_alloc": 57164504,
"memory_total": 228143544,
"rss": 145899520
},
"runtime": {
"goroutines": 90
}
},
"filebeat": {
"harvester": {
"open_files": 1,
"running": 1
}
},
"libbeat": {
"config": {
"module": {
"running": 0
}
},
"output": {
"events": {
"acked": 46,
"active": 3905,
"batches": 2,
"failed": 3905,
"total": 3905
}
},
"outputs": {
"kafka": {
"bytes_read": 56346,
"bytes_write": 175304
}
},
"pipeline": {
"clients": 1,
"events": {
"active": 4117,
"retry": 3905
}
}
},
"registrar": {
"states": {
"current": 1
}
},
"system": {
"load": {
"1": 2.94,
"15": 3.46,
"5": 3.31,
"norm": {
"1": 0.3675,
"15": 0.4325,
"5": 0.4138
}
}
}
}
}
}
Here is the Filebeat configuration:
filebeat.inputs:
- type: log
enabled: true
paths:
- /mongo/shard_*/mongo.log
scan_frequency: 5s
clean_removed: true
close_removed: true
close_renamed: true
max_bytes: 2000000
fields_under_root: true
fields:
type: mongo-shard
local_timezone: UTC
role: mongod
multiline:
pattern: ^[0-9]{1,}-[0-9]{1,}-[0-9]{1,}T[0-9]{1,}:[0-9]{1,}:[0-9]{1,}.[0-9]{1,}
negate: true
match: after
max_lines: 5000
timeout: 5s
output.kafka:
enabled: true
hosts: ['myhost:9096']
username: ${MYUSER}
password: ${MYPASSWORD}
sasl.mechanism: SCRAM-SHA-256
ssl.enabled: true
topic: mytopic
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000
path.data: /mongo/filebeat/data
path.logs: /mongo/filebeat/logs