FileBeat - corrupt file on rotation


(Jeff Mills) #1

Hi,
I'm using FileBeat to output to a file and then an ArcSight Regex file connector (usenonlockingwindowsfilereader=true, followexternalrotation=true ) pulls into ArcSight.
I have the rotation set to 2 because I don't really need to keep it.

However, the final file in the rotation becomes inaccessible to everyone, including Administrators. It seems to be corrupt.

Here is my output config:
output.file:
path: "C:\dnslogs\"
filename: dnsjsonoutput.log
rotate_every_kb: 102400
number_of_files: 2

This is what happens.
On first rotation, dnsjsonoutput.log is successfully renamed to dnsjsonoutput.log.1
On the next rotation, for some reason dnsjsonoutput.log.2 appears. It seems to have the oldest modified date, so I assume it's been renamed from dnsjsonoutput.log.1. However, the file is corrupt. I can't delete, take ownership, or even see the current owner of the file.
Eventually, I get the following error in the FileBeat log:
2016-12-14T15:15:12+11:00 CRIT Unable to write events to file: remove C:\dnslogs\dnsjsonoutput.log.2: Access is denied.
2016-12-14T15:15:12+11:00 INFO Error bulk publishing events: remove C:\dnslogs\dnsjsonoutput.log.2: Access is denied.
If I don't start the ArcSight Connector, dnsjsonoutput.log.2 is never created.
Any idea what might be happening here?
Edit: As A side note, I can manually delete dnsjsonoutput.log.1 before the next round of rotation happens, and there is no problem.


(Jeff Mills) #2

Update: Switched from 2 to 5 rotated files, and the same thing happens to dnsjsonoutput.log.5
I know ArcSight is not locking the file because I can delete it myself if I get to it before FileBeat does.


(ruflin) #3

A few questions

  • What is your operating system (Windows something I assume)
  • Do you use any special file system?
  • Did you try what happens if you have ArcSight disabled?
  • Any chance to share such a "corrupt" file?
  • How long does it take on average between each rotation?

(Jeff Mills) #4
  • Windows 2012 64bit
  • Nothing special on the FS
  • I need to do further testing with this, but it will take some time to get priv'd access.
  • I doubt this will be an easy one since I don't even appear to have read access to the file. I have a feeling I'm going to need to boot into a Linux environment to fix this. It's a production server and we're in a change freeze. Happy to share with elastic support (we are a customer).
  • 2-5 mins per rotation (100MB files), although I also have other corrupt files of 200MB and approx 650MB from different configurations. Having 1GB of corrupt files is a bit annoying.

I might have to try rotating smaller files in our test environment to see if I can reproduce the issue because I don't want to create any more of these files in production.


(Jeff Mills) #5

Have not been able to reproduce in the test environment, although it is a much quieter environment.
Set the rotation down to 512k, but it seems to be rotating fine.


(Jeff Mills) #6

Tested on another production box. Same issue.
Unfortunately filebeat stopped debug logging, so I can't actually see what it's logs say.
However, ArcSight logs that it's finished processing the file right after the new one is started, so I don't think it's slow processing causing it.
I have also tried "usealternaterotationdetection=true" in ArcSight.

Dec 16 2016 11:06:33 C:\dnslogs\dnslog.json 0 File processing ended: Success
Dec 16 2016 11:06:33 C:\dnslogs\dnslog.json 0 File processing started
Dec 16 2016 11:03:50 0 Connector Raw Event Statistics
Dec 16 2016 11:02:00 C:\dnslogs\dnslog.json 0 File processing ended: Success
Dec 16 2016 11:02:00 C:\dnslogs\dnslog.json 0 File processing started
Dec 16 2016 10:58:50 0 Connector Raw Event Statistics
Dec 16 2016 10:58:23 0 Agent configuration change received
Dec 16 2016 10:57:55 C:\dnslogs\dnslog.json 0 File processing ended: Success
Dec 16 2016 10:57:55 C:\dnslogs\dnslog.json 0 File processing started
Dec 16 2016 10:57:31 0 Agent configuration change received
Dec 16 2016 10:53:50 0 Connector Raw Event Statistics
Dec 16 2016 10:53:28 C:\dnslogs\dnslog.json 0 File processing ended: Success
Dec 16 2016 10:53:28 C:\dnslogs\dnslog.json 0 File processing started
Dec 16 2016 10:50:53 0 Event Transport Fail Over
Dec 16 2016 10:48:50 0 Agent [BRS_BANK_MSDNS_AD1] type [sdkrfilereader] started
Dec 16 2016 10:48:50 C:\dnslogs\dnslog.json File Opened 0 Device connection up

I'm going to switch over to sending to logstash/Elastic (then hopefully output to ArcSight from there), so for me this issue is going to go away.
I have been running FileBeat on the original server without ArcSight reading the output file, and it has been happy for 30 mins or so.
So at this point, it does seem to be FileBeat and ArcSight not liking each other, but who knows why. I previously used nxlog with ArcSight in the same way and they played nicely together, but we wanted to use something with commercial support - hence FileBeat.

Edit: Also confirmed that I can't read the corrupt file, so can't get a copy at the moment.


(ruflin) #7

Glad to hear it works with Logstash as expected. I don't really know ArcSight but I assume the log rotation of filebeat and ArcSight do not play well together. Filebeat does not do any access right changes on the log file so I would expect it to be ArcSight. Having multiple sources reading and writing to a file is a little bit tricky on Windows to set all the correct flags. I assume there something does not work as expected.

Feel free to bring this up with support so we can dig deeper when you get potential access to the log files.


(Jeff Mills) #8

I had access today, but filebeat stopped debug logging before the problem occurred.

What would cause the extra filename to be created?
When it's working properly, with 5 files configured, we have x.log x.log.1 x.log.2 x.log.3 and x.log.4
When this breaks, it's x.log.5 that is broken that would not normally even exist.

If ArcSight was really locking the file, surely it wouldn't be able to be renamed 4 times before the issue occurs?
I don't think ArcSight is changing any permissions, I think the file just becomes corrupt, so windows can't read the security any more.


(ruflin) #9

I think what happens is that there is very quickly a x.log.5 file for rotation reason. x.log.4 is quickly renamed to x.log.5 and as soon as rotation of all files is completed, the file is removed.

So the difference with the 5th file is that is deleted. And it could well be that ArcSight does not allow that. It still does not explain why the file gets corrupt. The only explanation is that both tools "fight" about the deleting and corrupt the file on the way.


(Jeff Mills) #10

Ok that makes sense.
What doesn't make sense is that I can delete any of the files myself before the file corrupts - I wouldn't be able to do that if the files were locked. Arcsight should only be reading x.log and x.log.1 for a short time until it's finished processing it. The ArcSight connector log seems to say it's done within the same second that filebeat rotates.
I can't replicate it on a quieter test server, so it's got something to do with a busier server it seems. I think the best way to see what's going on is to use the pstools process monitor or something like that to watch the file locks.

The problem is, I can't keep testing on our busy (production) servers because every time it leaves behind a file I can't do anything with. It's not a good look.


(Jeff Mills) #11

Just saw this same issue in the test environment over the weekend, so I'm going to enable filebeat debug logging and see what it shows.


(ruflin) #12

Thanks a lot for investigating this further. Looking forward to some filebeat debug log files.

About the corrupt file: Does it still have a size? Can you somehow get at the content of the file?


(Jeff Mills) #13

File size is whatever the rotation size has been set to.
There is no way to even read the file within the Windows OS currently. I think the only way would be to boot in to a different OS, which I might be able to do on the test server.


(ruflin) #14

Finally back online. Any chance you could share the corrupt file?


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.