Logstash vs NFS (null characters)

Hi, there is an issue with logstash getting log files from remote server over NFS.
Issue described here Redirecting to Google Groups
And here is explanation why it is happening Java InputStream read methods returning ASCII 'NUL' characters for file in a NFS mount location - Stack Overflow
Short version of explanation if you don't want to follow links:

The NUL characters described in this question appear due to asynchronous writes to the file being read from. More specifically, packets of data from the remote file writer have arrived out of order, and the NAS buffer has committed a later packet and padded the area for the unreceived data with NUL characters. When the missing packet is received, the NAS buffer commits it, overwriting those null characters.

So are there any people who experienced the same problem? How to fix it? (via nfs options, or maybe I have to use other way to mount logs to logstash servers?).

There is also a dirty solution to re-read line when \u0000 character appear, but it seems like default logstash can't do this.

I think i'am having a problem due to the same cause: logstash reads log files over NFS, everything seems to work fine on startup but then those files end up with one line full af 'NUL' characters and logstash stops reading the files (but the files are still being updated). When I restart logstash, the 'NUL' line disappear and the files are being read again. (I made a comment on this issue : https://github.com/logstash-plugins/logstash-input-file/issues/38#issuecomment-154411416).

Did you find a way to fix this your problem ?

I had the same problem. Couldn't find a way around it. Our (dirty) hack/workaround is to have a scheduled task running every 5 minutes to:

  • shutdown logstash indexer
  • copy the file
  • restart logstash and index file

This is okay for us as we don't need perfectly real-time indexing at least it works reliably since a couple of months. It also solved the problems with locked resources we encountered before...

To avoid this problem we switched to remotefs. It is pretty simple fuse fs for nfs-like exporting of logs.
There is approx 1000 lines of code and developer is responding very fast, so you are free to change the code with minimum efforts.
For logstash I've implemented 2 patches:

  1. To proxy inode number from server.
  2. ipv6 fix

So we use it in production, with about 200Gb of logs per day, and it works (not very fast but enough for us).
The only problem is that with remotefs logstash can't handle renaming of file and re-reading it from beginning once file renamed, but we avoiding that by specifying more exact path in logstash config to prevent matching renamed files (but it is not really a solution).

And we haven't find way to make NFS work. Also tried glusterfs but it is not working with ipv6 by now.