filebeat hogged the IO
After starting filebeat, I found that the IO of the machine became very high. After the digging, it was found that filebeat kept writing to disk a file named checkpoint.new after it was started.
So I used "checkpoint.new" to search in source code to find which function call produce the file named checkpoint.new. Finally, I found that the checkpoint.new was produced by "WriteCheckpoint" in diskstore.go:292. Before calling "WriteCheckpoint", another function "mustCheckpoint"(store.go:213) was executed to decide wether to call "WriteCheckpoint". The detail code of mustCheckpoint is shown as follows:
// mustCheckpoint returns true if the store is required to execute a checkpoint
// operation, either by predicate or by some internal state detecting a problem
// with the log file.
func (s *diskstore) mustCheckpoint() bool {
return s.logInvalid || s.checkpointPred(s.logFileSize)
}
If s.logInvalid is true, mustCheckpoint always return true. So the store is always required to execute a checkpoint.
I found a warning message with "Incomplete or corrupted log file in" in the log of filebeat. This message will be printed indicating that there was an error loading the log file(store.go:130). At this time, the value of s.logInvalid is equal to err!=nil(store.go:133). Unfortunately in this case err!=nil equals true, i.e. s.logInvali equals true.
As mentioned above, once s.logInvali equals true, the store is always required to execute a checkpoint.
So once there is a problem with the log file, it will lead to frequent checkpoints and finally cause high IO.
Maybe it's a bug ?
diskstore.go: beats/diskstore.go at main · elastic/beats · GitHub
store.go: beats/store.go at v7.17.2 · elastic/beats · GitHub