Duplicate events with Filebeat on windows on service restart

We have a log file we are harvesting with Filebeat on some Windows hosts (2008 R2 Datacenter). Whenever the filebeat service is restarted the file gets fully resent.

Here the log of filebeat shutting down:

2017-03-15T18:16:10+02:00 INFO No non-zero metrics in the last 30s
2017-03-15T18:16:26+02:00 INFO Stopping filebeat
2017-03-15T18:16:26+02:00 INFO Stopping Crawler
2017-03-15T18:16:26+02:00 INFO Stopping 1 prospectors
2017-03-15T18:16:26+02:00 INFO Stopping Prospector
2017-03-15T18:16:26+02:00 INFO Prospector ticker stopped
2017-03-15T18:16:26+02:00 INFO Reader was closed: F:/wwwLogs/flights-wsapi/WebServicesWebApi.log. Closing.
2017-03-15T18:16:26+02:00 INFO Prospector channel stopped
2017-03-15T18:16:26+02:00 INFO Crawler stopped
2017-03-15T18:16:26+02:00 INFO Stopping spooler
2017-03-15T18:16:26+02:00 INFO Stopping Registrar
2017-03-15T18:16:26+02:00 INFO Ending Registrar
2017-03-15T18:16:26+02:00 INFO Total non-zero values:  libbeat.logstash.publish.write_bytes=25069994 registar.states.current=1 registrar.writes=136 filebeat.harvester.started=1 libbeat.logstash.publish.write_errors=11 libbeat.logstash.published_but_not_acked_events=10424 libbeat.logstash.publish.read_errors=4 libbeat.publisher.published_events=272780 libbeat.logstash.publish.read_bytes=43641 publish.events=272781 registrar.states.update=272781 libbeat.logstash.call_count.PublishEvents=143 libbeat.logstash.published_and_acked_events=272780 filebeat.harvester.closed=1
2017-03-15T18:16:26+02:00 INFO Uptime: 3m15.984s
2017-03-15T18:16:26+02:00 INFO filebeat stopped.

and filebeat starting up:

2017-03-15T18:16:42+02:00 INFO Home path: [C:\Program Files\Filebeat] Config path: [C:\Program Files\Filebeat] Data path: [C:\\ProgramData\\filebeat] Logs path: [C:\Program Files\Filebeat\logs]
2017-03-15T18:16:42+02:00 INFO Setup Beat: filebeat; Version: 5.2.1
2017-03-15T18:16:42+02:00 INFO Max Retries set to: 3
2017-03-15T18:16:42+02:00 INFO Activated logstash as output plugin.
2017-03-15T18:16:42+02:00 INFO Publisher name: METADIAKOPES30
2017-03-15T18:16:42+02:00 INFO Flush Interval set to: 1s
2017-03-15T18:16:42+02:00 INFO Max Bulk Size set to: 2048
2017-03-15T18:16:42+02:00 INFO Additional config files are fetched from: C:/Program Files/Filebeat/conf.d/
2017-03-15T18:16:42+02:00 INFO Additional configs loaded from: C:\Program Files\Filebeat\conf.d\wsj.yml
2017-03-15T18:16:42+02:00 INFO filebeat start running.
2017-03-15T18:16:42+02:00 INFO Registry file set to: C:/Program Files/Filebeat/registry
2017-03-15T18:16:42+02:00 INFO Loading registrar data from C:/Program Files/Filebeat/registry
2017-03-15T18:16:42+02:00 INFO States Loaded from registrar: 1
2017-03-15T18:16:42+02:00 INFO Loading Prospectors: 1
2017-03-15T18:16:42+02:00 INFO Prospector with previous states loaded: 0
2017-03-15T18:16:42+02:00 INFO Starting Registrar
2017-03-15T18:16:42+02:00 INFO Start sending events to output
2017-03-15T18:16:42+02:00 INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2017-03-15T18:16:42+02:00 INFO Loading Prospectors completed. Number of prospectors: 1
2017-03-15T18:16:42+02:00 INFO All prospectors are initialised and running with 1 states to persist
2017-03-15T18:16:42+02:00 INFO Starting prospector of type: log
2017-03-15T18:16:42+02:00 INFO Harvester started for file: F:/wwwLogs/flights-wsapi/WebServicesWebApi.log

The registry file seems to reset the offset to 0 and thus resending the file. There's only one entry here and that looks ok - same FileStateOS data before and after the restart. But we also see a registry.old file when the service starts which looks like a copy of the registry file before the new start and the offset resetting.

[{
    "source": "F:/wwwLogs/flights-wsapi/WebServicesWebApi.log",
    "offset": 97601249,
    "FileStateOS": {
        "idxhi": 149225472,
        "idxlo": 510,
        "vol": 2488112230
    },
    "timestamp": "2017-03-15T18:19:08.1504+02:00",
    "ttl": -1
}]
1 Like

Can you provide the FileStateOS values before and after the restart?

What filesystem is F:\ using and is it a local disk or network share?

FileStateOS values stay the same as I mention in the post. We are on AWS and F: is a local EBS disk.

Here are three states of the registry file 1. working normally, 2. after stopping it, 3. after starting it again:

1. [{"source":"F:/wwwLogs/flights-wsapi/WebServicesWebApi.log","offset":126197724,"FileStateOS":{"idxhi":103743488,"idxlo":221,"vol":2488112230},"timestamp":"2017-03-16T19:09:37.2604005+02:00","ttl":-1}]
2. [{"source":"F:/wwwLogs/flights-wsapi/WebServicesWebApi.log","offset":126336375,"FileStateOS":{"idxhi":103743488,"idxlo":221,"vol":2488112230},"timestamp":"2017-03-16T19:10:12.2734005+02:00","ttl":-1}]
3. [{"source":"F:/wwwLogs/flights-wsapi/WebServicesWebApi.log","offset":3598255,"FileStateOS":{"idxhi":103743488,"idxlo":221,"vol":2488112230},"timestamp":"2017-03-16T19:10:31.6366005+02:00","ttl":-1}]

As you can see FileStateOS stays the same and the offset has rewinded.

Thanks for showing the registry before and after.

This sounds similar to another post: Filebeat on Windows seem to not use the registry file

(edit: oh, just saw you posted there too. good)

What @ruflin mentioned in the other post caught my eye:

I remember we had an issue about path matching in the 5.0-beta versions but this should have been fixed.

We too use slashes as path separators for the monitored files like @Steiniche. I tested changing it to backslashes and it works correctly! I also tested single quotes, double quotes (with double backslashes) and no quotes (with double backslashes again) and all these seem to be working great.

I have tested it on our staging environment, if anything goes wrong on production where there's a much bigger number of hosts and files to test with I'll post again.

Thanks

@ruflin Probably somewhere in the code we are missing a call to filepath.Clean to normalize the path.

@eirc Thanks a lot for the further digging. Could you post all the config options that work and don't work? So I can create some tests for it?

@andrewkroh Yeah, and I have a suspicion where :frowning:

@eirc Do you have a chance to test a snapshot build? The reason is asking because there is a change in master / 5.3 to convert all paths to absolute paths: https://github.com/elastic/beats/blob/master/filebeat/prospector/prospector_log.go#L219 And this conversion also runs the Clean part. Nightly builds can be found here: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

So even this fix was not intended for the above, this is the place I was thinking that things could go wrong, and it could fix it. Would be great to hear if the master build fixes the issue.

Up to now we've been setting all the path settings with slashes. The ones we use are:

  • filebeat.registry_file
  • filebeat.config_dir
  • output.logstash.ssl.certificate_authorities
  • logging.files.path
  • filebeat.prospectors.paths

AFAIK the only setting with an issue was filebeat.prospectors.paths due to the unclean check against the registry file states but for consistency's sake we updated all our paths to double quoted double backslashes.

I can also confirm that the nightly you linked works correctly. The paths in the registry file are now double quoted double backslashes while in my config I have set them as single slashes. There's no harvester started message for the file and there's one prospector with previous states loaded. And of course no double entries on Elasticsearch.

So thanks for everything and I hope we helped here :slight_smile:

Thanks for testing, great to hear it works. About the paths, I was referring to the paths you defined in filebeat.prospectors.paths:. Could you post the different "versions" you tried out? Just to make sure I will also test the same versions.

I tested these 3 versions of my file's path:

  • "F:/wwwLogs/flights-wsapi/WebServicesWebApi.log"
  • "F:\\wwwLogs\\flights-wsapi\\WebServicesWebApi.log"
  • 'F:\wwwLogs\flights-wsapi\WebServicesWebApi.log'

The first one is the only one that breaks on 5.2, on the nightly I tested the first two and they worked.

Ok, so in 5.2 we can "solve" the problem by telling the user to use option 2 or 3 for the path config and in 5.3 it should be solved already.

I can confirm that I have / as a path seperator.
I will try and test with \ instead.

I have just tested option 3 (one blackslash) instead of a slash and events is not getting resend anymore.
Thanks for the workaround @eirc!

I also opened https://github.com/elastic/beats/pull/3781 to do some more path cleanup to make sure we don't hit any other issues in the future.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.