Filebeat, file truncated on glusterFS

An httpd log file is being stored on a glusterFS partition, never rotated, and read by filebeat (6.2.3).

I should also mention that filebeat runs in a docker container, which maps the log file as a volume.

This same configuration worked fine until the file was on a local disk. Since we moved it to a gluster partition, Filebeat sees the file as truncated, every few minutes, and starts reading from the beginning:

INFO log/harvester.go:229 File was truncated. Begin reading file from offset 0: /data/logfiles/production_access.log

I have read on the FAQ that reading logs from a network filesystem is not recommended. However, it seems to me that most concerns are about log rotation. There is no rotation happening here. It's a 160M file, never truncated.

Is there any configuration option that I could try?

Could you share the full debug log of Filebeat? (./filebeat -e -d "*")

It seems to me that GlusterFS has its own inode management which might lead to weird behaviour of Filebeat. I am still looking at the docs of GlusterFS, so I can come back with more valuable comments on your case.

Here is an excerpt around the truncation (which happens at 11:27:17). I removed the actual events, to make it more readable.

It looks like the problem is a mismatch between the file size and the offset.

2018-04-04T11:27:09.831Z	DEBUG	[harvester]	log/log.go:85	End of file reached: /data/logfiles/production_access.log; Backoff now.
2018-04-04T11:27:10.832Z	DEBUG	[logstash]	logstash/async.go:142	2 events out of 2 events sent to logstash host logstash:5044. Continue sending
2018-04-04T11:27:11.230Z	DEBUG	[registrar]	registrar/registrar.go:200	Processing 2 events
2018-04-04T11:27:11.230Z	DEBUG	[registrar]	registrar/registrar.go:193	Registrar states cleaned up. Before: 1, After: 1
2018-04-04T11:27:11.230Z	DEBUG	[registrar]	registrar/registrar.go:228	Write registry file: /usr/share/filebeat/data/registry
2018-04-04T11:27:11.313Z	DEBUG	[registrar]	registrar/registrar.go:253	Registry file updated. 1 states written.
2018-04-04T11:27:11.699Z	DEBUG	[prospector]	prospector/prospector.go:124	Run prospector
2018-04-04T11:27:11.699Z	DEBUG	[prospector]	log/prospector.go:147	Start next scan
2018-04-04T11:27:11.701Z	DEBUG	[prospector]	log/prospector.go:361	Check file for harvesting: /data/logfiles/production_access.log
2018-04-04T11:27:11.701Z	DEBUG	[prospector]	log/prospector.go:447	Update existing file for harvesting: /data/logfiles/production_access.log, offset: 174437390
2018-04-04T11:27:11.701Z	DEBUG	[prospector]	log/prospector.go:499	Harvester for file is still running: /data/logfiles/production_access.log
2018-04-04T11:27:11.701Z	DEBUG	[prospector]	log/prospector.go:168	Prospector states cleaned up. Before: 1, After: 1
2018-04-04T11:27:11.834Z	DEBUG	[harvester]	log/log.go:85	End of file reached: /data/logfiles/production_access.log; Backoff now.
2018-04-04T11:27:12.836Z	DEBUG	[logstash]	logstash/async.go:142	3 events out of 3 events sent to logstash host logstash:5044. Continue sending
2018-04-04T11:27:12.886Z	DEBUG	[registrar]	registrar/registrar.go:200	Processing 3 events
2018-04-04T11:27:12.886Z	DEBUG	[registrar]	registrar/registrar.go:193	Registrar states cleaned up. Before: 1, After: 1
2018-04-04T11:27:12.886Z	DEBUG	[registrar]	registrar/registrar.go:228	Write registry file: /usr/share/filebeat/data/registry
2018-04-04T11:27:12.951Z	DEBUG	[registrar]	registrar/registrar.go:253	Registry file updated. 1 states written.
2018-04-04T11:27:13.838Z	DEBUG	[harvester]	log/log.go:85	End of file reached: /data/logfiles/production_access.log; Backoff now.
2018-04-04T11:27:14.840Z	DEBUG	[logstash]	logstash/async.go:142	3 events out of 3 events sent to logstash host logstash:5044. Continue sending
2018-04-04T11:27:17.006Z	DEBUG	[harvester]	log/log.go:118	File was truncated as offset (174439171) > size (174438794): /data/logfiles/production_access.log
2018-04-04T11:27:17.007Z	INFO	log/harvester.go:229	File was truncated. Begin reading file from offset 0: /data/logfiles/production_access.log
2018-04-04T11:27:17.094Z	DEBUG	[harvester]	log/harvester.go:468	Stopping harvester for file: /data/logfiles/production_access.log
2018-04-04T11:27:17.094Z	DEBUG	[harvester]	log/harvester.go:478	Closing file: /data/logfiles/production_access.log
2018-04-04T11:27:17.094Z	DEBUG	[harvester]	log/harvester.go:348	Update state: /data/logfiles/production_access.log, offset: 0
2018-04-04T11:27:17.095Z	DEBUG	[harvester]	log/harvester.go:489	harvester cleanup finished for file: /data/logfiles/production_access.log
2018-04-04T11:27:17.108Z	DEBUG	[registrar]	registrar/registrar.go:200	Processing 3 events
2018-04-04T11:27:17.108Z	DEBUG	[registrar]	registrar/registrar.go:193	Registrar states cleaned up. Before: 1, After: 1
2018-04-04T11:27:17.108Z	DEBUG	[registrar]	registrar/registrar.go:228	Write registry file: /usr/share/filebeat/data/registry
2018-04-04T11:27:17.330Z	DEBUG	[registrar]	registrar/registrar.go:253	Registry file updated. 1 states written.
2018-04-04T11:27:17.895Z	DEBUG	[logstash]	logstash/async.go:142	3 events out of 3 events sent to logstash host logstash:5044. Continue sending
2018-04-04T11:27:18.057Z	DEBUG	[registrar]	registrar/registrar.go:200	Processing 4 events
2018-04-04T11:27:18.057Z	DEBUG	[registrar]	registrar/registrar.go:193	Registrar states cleaned up. Before: 1, After: 1
2018-04-04T11:27:18.057Z	DEBUG	[registrar]	registrar/registrar.go:228	Write registry file: /usr/share/filebeat/data/registry
2018-04-04T11:27:18.111Z	DEBUG	[registrar]	registrar/registrar.go:253	Registry file updated. 1 states written.
2018-04-04T11:27:21.032Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2130,"time":2138},"total":{"ticks":6510,"time":6524,"value":6510},"user":{"ticks":4380,"time":4386}},"info":{"ephemeral_id":"c6ba2b0c-9412-4e0c-946e-35bd760ff108","uptime":{"ms":1230018}},"memstats":{"gc_next":5815936,"memory_alloc":4472304,"memory_total":944266416,"rss":143360}},"filebeat":{"events":{"active":-5,"added":53,"done":58},"harvester":{"closed":1,"open_files":0,"running":0},"prospector":{"log":{"files":{"truncated":1}}}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":57,"active":-5,"batches":13,"total":52},"read":{"bytes":84},"write":{"bytes":7622}},"pipeline":{"clients":1,"events":{"active":0,"filtered":1,"published":52,"total":53},"queue":{"acked":57}}},"registrar":{"states":{"current":1,"update":58},"writes":14},"system":{"load":{"1":3.37,"15":2.78,"5":2.71,"norm":{"1":0.4213,"15":0.3475,"5":0.3388}}}}}}

@kvch, I ran a script to monitor the inode and the size of this log file, to see if the statistics on the file were reported incorrectly by glusterFS.

While the truncation was reported, the inode never changed and the file size never decreased.

So I don't understand how it can be that offset > size.

If interested, here's the script (it uses stat):

INTERVAL=0.25
LINE=`stat -c "%i %s" production_access.log`
INODE=${LINE% *}
SIZE=${LINE#* }
OLD_INODE=${INODE}
OLD_SIZE=${SIZE}

echo
while true
do
	sleep ${INTERVAL}
	LINE=`stat -c "%i %s" production_access.log`
	INODE=${LINE% *}
	SIZE=${LINE#* }
	
	if [[ ${INODE} -ne ${OLD_INODE} ]]
	then
		echo
		echo "inode changed: ${OLD_INODE} -> ${INODE}"
		exit 1
	fi

	if [[ ${SIZE} -lt ${OLD_SIZE} ]]
	then
		echo
		echo "size decreased: ${OLD_SIZE} -> ${SIZE}"
		exit 2
	fi

	echo -en "\rinode: ${INODE} | size: ${SIZE}	"
	OLD_INODE=${INODE}
	OLD_SIZE=${SIZE}
done

I ran some more tests, monitoring the log files and the content of the registry.
The important events are listed here:

2018-04-06T15:49:40.884Z        DEBUG   [registrar]     registrar/registrar.go:228      Write registry file: /usr/share/filebeat/data/registry
[{"source":"/data/logfiles/production_access.log","offset":239952547,"timestamp":"2018-04-06T15:49:40.883995409Z","ttl":-1,"type":"log","FileStateOS":{"inode":9545578365002182417,"device":38}}]

2018-04-06T15:49:40.887Z        DEBUG   [harvester]     log/log.go:118  File was truncated as offset (239953398) > size (239952966): /data/logfiles/production_access.log

15:49:40.884 : offset = 239952547
15:49:40.887 (+3ms) : offset = 239953398 (+851), file = 239952966 (+419)

So after 3ms from the last written offset, the in-memory offset grows larger than the file size.
I can't figure out how this is possible, as I imagine the offset is updated after a line is read from the file, which means that the offset cannot ever possibly be larger than the file size.

I guess this can be closed, as it doesn't seem to be a filebeat issue: tail -f also reports truncations on the same file.

I am glad it is resolved. However, I am not so happy that you can't read your logs.

Should this be useful to someone else in the same situation: it turns out that this is actually a bug in GlusterFS, which is not yet solved: https://bugzilla.redhat.com/show_bug.cgi?id=1390050
It affects GlusterFS 3.10.x and 3.11.x and does not seem to happen on older versions.

1 Like

Thanks for sharing it!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.