Filebeat hogged the IO

filebeat hogged the IO

After starting filebeat, I found that the IO of the machine became very high. After the digging, it was found that filebeat kept writing to disk a file named checkpoint.new after it was started.

So I used "checkpoint.new" to search in source code to find which function call produce the file named checkpoint.new. Finally, I found that the checkpoint.new was produced by "WriteCheckpoint" in diskstore.go:292. Before calling "WriteCheckpoint", another function "mustCheckpoint"(store.go:213) was executed to decide wether to call "WriteCheckpoint". The detail code of mustCheckpoint is shown as follows:


// mustCheckpoint returns true if the store is required to execute a checkpoint

// operation, either by predicate or by some internal state detecting a problem

// with the log file.

func (s *diskstore) mustCheckpoint() bool {

return s.logInvalid || s.checkpointPred(s.logFileSize)

}

If s.logInvalid is true, mustCheckpoint always return true. So the store is always required to execute a checkpoint.

I found a warning message with "Incomplete or corrupted log file in" in the log of filebeat. This message will be printed indicating that there was an error loading the log file(store.go:130). At this time, the value of s.logInvalid is equal to err!=nil(store.go:133). Unfortunately in this case err!=nil equals true, i.e. s.logInvali equals true.

As mentioned above, once s.logInvali equals true, the store is always required to execute a checkpoint.

So once there is a problem with the log file, it will lead to frequent checkpoints and finally cause high IO.

Maybe it's a bug ?

diskstore.go: beats/diskstore.go at main · elastic/beats · GitHub

store.go: beats/store.go at v7.17.2 · elastic/beats · GitHub