Filebeat Amazon ECS log rotation issue

justkind · July 23, 2019, 4:51pm

Hi Elastic Team,

Sorry to bother y'all, but I've been running into an issue using Filebeat on Amazon ECS and would appreciate any help.

Summary:
Filebeat 6.8.1 deployed on each individual ECS host instance and forwards logs to an Amazon Elasticache Redis cluster where log events are pulled in by Logstash and the Redis input plugin.
The Filebeat container itself is configured with the awslogs driver to send its own logs to Cloudwatch Logs and is configured to forward all docker container logs.

I am seeing my Filebeat ECS tasks/containers spiking up to almost max memory for a long period of time before eventually being rotated out by ECS.
When the memory spikes, we see the following error and a large amount of blank lines:
ERROR log/harvester.go:282 Read line error: parsing CRI timestamp: parsing time "

Looking into the issue, the timestamp of the errors appear to coincide with the log rotation of the Amazon ecs-init agent that runs on each host instance, specifically the gz compression step at Jul 21 03:16

-rw-r----- 1 root root  9634140 Jul 23 15:34 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log
-rw-r----- 1 root root 16000151 Jul 23 00:16 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log.1
-rw-r----- 1 root root 16000165 Jul 21 23:41 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log.2
-rw-r----- 1 root root   185681 Jul 21 03:16 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log-20190721.gz
-rw-r----- 1 root root 16000252 Jul 21 00:25 e905f2afd21d6b423afa80a0101097018ab50783f73be7033cb6a80aa00850f2-json.log.3

Could anyone take a look at my below configuration and let me know if these errors are due to a misconfiguration on my part?
I believe my configuration is not handling log rotation well and am unsure of how to configure it to handle this.
Any assistance would be greatly appreciated.

Thanks for your time,

filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        config:
          - type: docker
            containers.ids:
              - "${data.docker.container.id}"
            multiline.pattern: '^[[:space:]]+(at|\.{3})|^Caused by:|^org.springframework|^java.|\\t+(at|\.{3})'
            multiline.negate: false
            multiline.match: after

filebeat.inputs:
  - type: docker
    containers.ids:
      - "*"
    processors:
      - add_docker_metadata: ~
    multiline.pattern: '^[[:space:]]+(at|\.{3})|^Caused by:|^org.springframework|^java.|\\t+(at|\.{3})'
    multiline.negate: false
    multiline.match: after

processors:
- add_cloud_metadata: ~

output.redis:
  hosts: ["redis-filebeat:6379"]
  key: "filebeat_beta"
  db: 0
  timeout: 5

logging.level: error
logging.to_files: false

justkind · July 23, 2019, 8:23pm

Another thing I wanted to add is that I believe I saw this issue a few weeks ago when another application did some log rotation without compression but am unable to get proof right now as it was a while ago.

I assumed it was an issue with the logging level as it was set to INFO at the time resulting in lots of blank log entries so I changed it to ERROR, but unfortunately the error persists.

I will add the following line to my docker input to ignore the .gz files to see if that fixes the issue, but I guess the main question in this post is if anyone knows what's happening with the increased memory usage and blank log lines, in this case, specifically because of a log rotated out and compressed

exclude_files: ['\.gz$']

pierhugues · July 25, 2019, 5:34pm

Hello,
I believe this was fixed using the new "container" input type see the relevant issues https://github.com/elastic/beats/pull/12162

justkind · July 25, 2019, 8:51pm

Hi @pierhugues ,

Thanks for the reply. Great to see that it is fixed in the future container input, unfortunately my Elasticsearch and Kibana is stuck at version 6.6 due to a dependency on the Sentinl alerting plugin which as of right now has no plans to upgrade to 7.x.

Is there anything I can do to get this working with the current versions and do you think it is related to log rotation or something else? Is upgrading to 7.2+ to get the container input the only fix for this issue?

I added the exclude_files for .gz yesterday and am still waiting to see if the error comes back because I'm still not sure if log rotation is the root cause, but I'll update this ticket if I see anything.

Thanks for your time,

justkind · August 6, 2019, 2:59pm

Hey @pierhugues,

The issue persisted even with the exclude_files unfortunately, but I found out that our instance had logrotate running at certain times and the log rotation from that seemed to be the root cause.

I removed logrotate on our docker logs and have been monitoring for about a week and haven't seen any issues.

Just wanted to let you and the team know in case this is something you are interested in looking into, but this issue seems to have been resolved.

Thank you

system · September 3, 2019, 2:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat Not able to catch up with rotating container logs Beats docker , filebeat	3	1688	March 11, 2022
Filebeat on Kubernetes EKS "file info is not identical with opened file. Aborting harvesting " Beats docker , filebeat	5	805	November 25, 2019
Aws ecs filebeat log collection stops when the application containers are restarted Beats docker , filebeat	7	466	November 9, 2022
Error parsing CRI timestamp Beats filebeat	2	1673	March 15, 2019
Receiving harvestor errors and invalid CRI log format on filebeat Beats docker , filebeat	1	188	April 6, 2023

Filebeat Amazon ECS log rotation issue

Related topics