Filebeat Not able to catch up with rotating container logs

We have a situation where container logs are rotated ~5 times with-in a minute and only last 5 files are kept before getting deleted.

We have Filebeat running as daemon set in the K8 cluster and it is able to send the logs for first few minutes and then it starts to lag to read logs from these files.
We have observed delay for about 12 hours and the filebeat memory is growing rapidly.

Also, observed that due to frequent log rotations the registry file is increasing and the number of open files keeps on increasing as well.

Below is snippet of filebeat logs

There are errors as well in filebeat logs which i believe is due to frequent log rotation.

I'm looking for filebeat configuration which i can use to read all these log files before being deleted.

Thanks in Advance.

Indeed, the errors are due to the quick log rotation. There are little things we can do about it in Filebeat, but the root cause of the problem is usually the fact that Elasticsearch cannot keep up with the constant flow of events from Filebeat. Please also look at your Elasticsearch instance and if there are any issues, fix them or give more resources to the instance.

Also, could you please share your Filebeat configuration, so we can fine-tune it for your use case?

Hi,

Appreciate your fast response.

I don't see any contention in logstash or in Elasticsearch. I tried increasing the logstash pods and also increased the CPU limits for it. I also tried increasing the refresh_interval for the index and replica to zero so that ingestion is fast. But no luck so far.

Filebeat is able to send logs with decent rate for 5 to 10 minutes after restart but eventually it lags as the number of open files keeps on increasing. If Elasticsearch is causing the contention then I don't think filebeat will be able to send logs after restarts for 15 minutes. On some nodes where log rotation is not that fast its working fine.

Below is the filebeat configuration:

- type: container
  containers.ids:
  - "*"
  paths:
    - "/var/lib/docker/containers/*/*.log"
  multiline.pattern: '^\[|^{|^\(|^[t]=|^text|^ERROR|^INFO|^DEBUG|^level=|^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}|^[0-9]{4}-[0-9]{2}-[0-9]{2}|^\[[0-9]{4}-[0-9]{2}-[0-9]{2}|^[0-9]{2}:[0-9]{2}:[0-9]{2}|^[0-9]{2,3}.[0-9]{2,3}.[0-9]{2,3}|^[a-zA-Z]{1}[0-9]{4}|^[a-zA-Z]{3,4}\s[a-zA-Z]{3}|^[a-zA-Z]{2,3}-[a-zA-Z]{2,5}-[0-9a-z]{2,7}'
  multiline.negate: true
  multiline.match: after
  clean_inactive: 61m          #Tried multiple values from hours to minutes 
  ignore_older: 60m            #Tried multiple values from hours to minutes 
  close_inactive: 1m            #Tried multiple values from few minutes to 10 seconds
  clean_removed: true       
  close_removed: true
  processors:
    - add_kubernetes_metadata:
        in_cluster: true
- type: log
  clean_removed: true
  paths:
    - /var/logs/*.log

output.logstash:
  hosts: ["${LOGSTASH_HOST:mon-logstash}:${LOGSTASH_PORT:5044}"] 
  pipelining: 4   #Tried default to 6
  worker: 6       #Again tried from default to 10 
  loadbalance: true   

In addition to above i tried following filebeat configurations as well:

  1. max_bulk_size
  2. scan_frequency
  3. filebeat.registry.flush
  4. TTL in output.logash

Also, there are couple of thousands entries in filebeat registry file for the rotated log files as log files are renamed with same name.

Logstash 3 pods: CPU limit is 4

Elasticsearch 3 data pods: CPU limit 4:

I have few workarounds but want to know if you are aware of this kind of race conditions with filebeat.

Is there a way i can see if Logstash/Elastsearch is putting back pressure on filebeat?May be filebeat metrics?

Also, it seems logstash is not completely loadbalanced. I tried setting TTL but it didn't work as expected. May be if you can shed some light on that as well will be helpful.

Once again thank you and looking forward for your response.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.