Filebeat Amazon ECS log rotation issue

Hi Elastic Team,

Sorry to bother y'all, but I've been running into an issue using Filebeat on Amazon ECS and would appreciate any help.

Summary:
Filebeat 6.8.1 deployed on each individual ECS host instance and forwards logs to an Amazon Elasticache Redis cluster where log events are pulled in by Logstash and the Redis input plugin.
The Filebeat container itself is configured with the awslogs driver to send its own logs to Cloudwatch Logs and is configured to forward all docker container logs.

I am seeing my Filebeat ECS tasks/containers spiking up to almost max memory for a long period of time before eventually being rotated out by ECS.
When the memory spikes, we see the following error and a large amount of blank lines:
ERROR log/harvester.go:282 Read line error: parsing CRI timestamp: parsing time "

Looking into the issue, the timestamp of the errors appear to coincide with the log rotation of the Amazon ecs-init agent that runs on each host instance, specifically the gz compression step at Jul 21 03:16

-rw-r----- 1 root root  9634140 Jul 23 15:34 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log
-rw-r----- 1 root root 16000151 Jul 23 00:16 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log.1
-rw-r----- 1 root root 16000165 Jul 21 23:41 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log.2
-rw-r----- 1 root root   185681 Jul 21 03:16 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log-20190721.gz
-rw-r----- 1 root root 16000252 Jul 21 00:25 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log.3

Could anyone take a look at my below configuration and let me know if these errors are due to a misconfiguration on my part?
I believe my configuration is not handling log rotation well and am unsure of how to configure it to handle this.
Any assistance would be greatly appreciated.

Thanks for your time,

filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        config:
          - type: docker
            containers.ids:
              - "${data.docker.container.id}"
            multiline.pattern: '^[[:space:]]+(at|\.{3})|^Caused by:|^org.springframework|^java.|\\t+(at|\.{3})'
            multiline.negate: false
            multiline.match: after

filebeat.inputs:
  - type: docker
    containers.ids:
      - "*"
    processors:
      - add_docker_metadata: ~
    multiline.pattern: '^[[:space:]]+(at|\.{3})|^Caused by:|^org.springframework|^java.|\\t+(at|\.{3})'
    multiline.negate: false
    multiline.match: after

processors:
- add_cloud_metadata: ~

output.redis:
  hosts: ["redis-filebeat:6379"]
  key: "filebeat_beta"
  db: 0
  timeout: 5

logging.level: error
logging.to_files: false

Another thing I wanted to add is that I believe I saw this issue a few weeks ago when another application did some log rotation without compression but am unable to get proof right now as it was a while ago.

I assumed it was an issue with the logging level as it was set to INFO at the time resulting in lots of blank log entries so I changed it to ERROR, but unfortunately the error persists.

I will add the following line to my docker input to ignore the .gz files to see if that fixes the issue, but I guess the main question in this post is if anyone knows what's happening with the increased memory usage and blank log lines, in this case, specifically because of a log rotated out and compressed

exclude_files: ['\.gz$']

Hello,
I believe this was fixed using the new "container" input type see the relevant issues https://github.com/elastic/beats/pull/12162

Hi @pierhugues ,

Thanks for the reply. Great to see that it is fixed in the future container input, unfortunately my Elasticsearch and Kibana is stuck at version 6.6 due to a dependency on the Sentinl alerting plugin which as of right now has no plans to upgrade to 7.x.

Is there anything I can do to get this working with the current versions and do you think it is related to log rotation or something else? Is upgrading to 7.2+ to get the container input the only fix for this issue?

I added the exclude_files for .gz yesterday and am still waiting to see if the error comes back because I'm still not sure if log rotation is the root cause, but I'll update this ticket if I see anything.

Thanks for your time,

Hey @pierhugues,

The issue persisted even with the exclude_files unfortunately, but I found out that our instance had logrotate running at certain times and the log rotation from that seemed to be the root cause.

I removed logrotate on our docker logs and have been monitoring for about a week and haven't seen any issues.

Just wanted to let you and the team know in case this is something you are interested in looking into, but this issue seems to have been resolved.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.