Input docker stuck when bad offset is saved in registry file - bug?

I'm using filebeat with docker input. In offset registry offset saved for specified docker file was incorect (probably some unclean shutdown or docker output file rotated when filebeat was down).

When filebeat started again offset pointed to the middle of JSON output line. Filebeat tried to start reading from the middle and failed on attempted parsing of CRI format. Harvester keeps trying to read from this line forever (is stuck on it).

Is there a way to made harvester/input to skip line after some number on unsuccessful parse attempts? If line is completed, there is rather no chance for it to be fixed later, so maybe it should skip that line at first unsuccessful attempt with error/warn log message.

source docker log file looked like this:

{"log":"AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.32.209.40. Set the 'ServerName' directive globally to suppress this message\n","stream":"stderr","time":"2019-05-17T13:45:42.217652719Z"}
{"log":"[Fri May 17 15:45:42.218593 2019] [suexec:notice] [pid 6] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)\n","stream":"stdout","time":"2019-05-17T13:45:42.219846354Z"}
{"log":"AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.32.209.40. Set the 'ServerName' directive globally to suppress this message\n","stream":"stdout","time":"2019-05-17T13:45:42.251525479Z"}
{"log":"[Fri May 17 15:45:42.251725 2019] [auth_digest:notice] [pid 6] AH01757: generating secret for digest authentication...\n","stream":"stdout","time":"2019-05-17T13:45:42.251821402Z"}
{"log":"[Fri May 17 15:45:42.253189 2019] [lbmethod_heartbeat:notice] [pid 6] AH02282: No slotmem from mod_heartmonitor\n","stream":"stdout","time":"2019-05-17T13:45:42.253329903Z"}
{"log":"[Fri May 17 15:45:42.272757 2019] [mpm_prefork:notice] [pid 6] AH00163: Apache/2.4.6 (CentOS) PHP/5.4.16 configured -- resuming normal operations\n","stream":"stdout","time":"2019-05-17T13:45:42.274386001Z"}
{"log":"[Fri May 17 15:45:42.272790 2019] [core:notice] [pid 6] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'\n","stream":"stdout","time":"2019-05-17T13:45:42.274416367Z"} 

filebeat output log:

2019-05-20T13:15:57.561+0200    INFO    log/harvester.go:254    Harvester started for file: /var/lib/docker/containers/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8-json.log
2019-05-20T13:15:57.561+0200    ERROR   log/harvester.go:281    Read line error: parsing CRI timestamp: parsing time "09.40." as "2006-01-02T15:04:05Z07:00": cannot parse "0." as "2006"; File: /var/lib/docker/containers/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8-json.log
2019-05-20T13:16:00.577+0200    INFO    log/harvester.go:254    Harvester started for file: /var/lib/docker/containers/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8-json.log
2019-05-20T13:16:00.577+0200    ERROR   log/harvester.go:281    Read line error: parsing CRI timestamp: parsing time "09.40." as "2006-01-02T15:04:05Z07:00": cannot parse "0." as "2006"; File: /var/lib/docker/containers/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8/b65baef300ff46398c813f0cff0912f9c2a788948cc1be759df648bedb7d99d8-json.log

Because of wrong offset in registry input saw first line as:

09.40. Set the 'ServerName' directive globally to suppress this message\n","stream":"stderr","time":"2019-05-17T13:45:42.217652719Z"}

I think Filebeat should behave better here in that scenario, I think the best way to workaround that would be to stops Filebeat and remove the entry of the file in the registry.

If the workaround you suggested, should be performed manually, that definetly fixes this single occurence of problem, but is impossible to do by hand when using filebeat to collecto logs from hundreds or thousands of containers in datacenter.

I can prepare simple PR that will skip unparsable lines.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.