The NUL characters described in this question appear due to asynchronous writes to the file being read from. More specifically, packets of data from the remote file writer have arrived out of order, and the NAS buffer has committed a later packet and padded the area for the unreceived data with NUL characters. When the missing packet is received, the NAS buffer commits it, overwriting those null characters.
So are there any people who experienced the same problem? How to fix it? (via nfs options, or maybe I have to use other way to mount logs to logstash servers?).
There is also a dirty solution to re-read line when \u0000 character appear, but it seems like default logstash can't do this.
I think i'am having a problem due to the same cause: logstash reads log files over NFS, everything seems to work fine on startup but then those files end up with one line full af 'NUL' characters and logstash stops reading the files (but the files are still being updated). When I restart logstash, the 'NUL' line disappear and the files are being read again. (I made a comment on this issue : https://github.com/logstash-plugins/logstash-input-file/issues/38#issuecomment-154411416).
I had the same problem. Couldn't find a way around it. Our (dirty) hack/workaround is to have a scheduled task running every 5 minutes to:
shutdown logstash indexer
copy the file
restart logstash and index file
This is okay for us as we don't need perfectly real-time indexing at least it works reliably since a couple of months. It also solved the problems with locked resources we encountered before...
To avoid this problem we switched to remotefs. It is pretty simple fuse fs for nfs-like exporting of logs.
There is approx 1000 lines of code and developer is responding very fast, so you are free to change the code with minimum efforts.
For logstash I've implemented 2 patches:
To proxy inode number from server.
ipv6 fix
So we use it in production, with about 200Gb of logs per day, and it works (not very fast but enough for us).
The only problem is that with remotefs logstash can't handle renaming of file and re-reading it from beginning once file renamed, but we avoiding that by specifying more exact path in logstash config to prevent matching renamed files (but it is not really a solution).
And we haven't find way to make NFS work. Also tried glusterfs but it is not working with ipv6 by now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.