Filebeat 5.6.12 issues running in Kubernates


(Akshath Patkar) #1

We are having issues while running filebeat as a deamon set.

File beat version : 5.6.12

  • We are running file beat as a deamonset across our different kubernates clusters.
  • We are running deamonset in its own name sapce (beat namespace)

What we started seeing lately is when ever log files gets rotated under path (we rotated logs every 10MB).
Logs files under path (/var/lib/docker/containers/container_id/container_id.log) rotated when the file reaches 10mb. filebeat detecting the rotation, picking up the new file, but fails to update the offset in the registry file. new file must have offset 0, which filebeat sets in the registry but it doesn’t read file properly (not sure why not). here’s part of registry file regarding onbe pod logs

e0d.log 3856443 {\"inode\":785507,\"device\":2049}"
e0d.log 3346846 {\"inode\":785503,\"device\":2049}"
e0d.log 2450253 {\"inode\":785504,\"device\":2049}"
e0d.log 1847998 {\"inode\":785505,\"device\":2049}"
e0d.log 1361586 {\"inode\":785506,\"device\":2049}"
e0d.log 1031051 {\"inode\":785508,\"device\":2049}"
e0d.log 738757 {\"inode\":785509,\"device\":2049}"
e0d.log 534620 {\"inode\":785510,\"device\":2049}"
e0d.log 295316 {\"inode\":785511,\"device\":2049}"
e0d.log 67009 {\"inode\":785512,\"device\":2049}"

Regards,
Akshath


(Steffen Siering) #2

Can you share filebeat logs? Filebeat prints a message when it picks up or closes a file. Including offset.


(Akshath Patkar) #3

Sure,

Here are the logs

2018/10/23 16:14:27.542783 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 5725
2018/10/23 16:14:29.632566 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 5725
2018/10/23 16:14:31.731292 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 5959
2018/10/23 16:14:33.734611 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 5959
2018/10/23 16:14:47.054499 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 0
2018/10/23 16:14:49.057494 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 0
2018/10/23 16:14:51.061137 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 0
2018/10/23 16:14:53.064741 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 215
2018/10/23 16:14:55.068573 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 215
2018/10/23 16:14:57.071924 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 215
2018/10/23 16:14:59.074826 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 437
2018/10/23 16:15:01.078298 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 437
2018/10/23 16:15:03.080797 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 653
2018/10/23 16:15:05.084802 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 653
2018/10/23 16:15:07.088225 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/***-**********-1_default_**********-dd170d394cbb683206a8128c67d80c688e16f41f69595b2485deb8698beed81f.log, offset: 653

This does looks like its picking up the offset but not updating properly.


(Akshath Patkar) #4

Also Just wanted to check, Is there a way to determine if filebeat is keeping up with log.

If a app writes too much of a log, can we determine if we can keep up with existing filebeat.

Any configs to improve the the filebeat throughput.


(Steffen Siering) #5

I don't see anything in the logs. Can you also share your filebeat configs?

If a app writes too much of a log, can we determine if we can keep up with existing filebeat.

Not sure, you are using a quite old filebeat version. In 6.5 we're adding a metric for reporting the number of open file handles. This is often an indicator of filebeat being 'behind', as filebeat keeps rotated files open. In the future we will try to add metrics of actual file size vs. read offsets, so one can tell how many bytes filebeat is behind.

Any configs to improve the the filebeat throughput.

Throughput does not only depend on filebeat, but even more so on the data sink and intermediate services processing events from filebeat. From experience, filebeat is rarely the bottleneck. But still, it must be tuned in order to 'satisfy' the event consumers.
Some recent blog post on Beats -> Elasticsearch tuning, which makes sense to tune after tuning Elasticsearch.


(Akshath Patkar) #6

Hi Steffen,

Thanks. Yeah i was going through the blog post of Beats -> Elasticsearch tuning

Here is the config

filebeat.yml:
----
filebeat.registry_file: /var/log/filebeat.registry
logging.level: ${LOG_LEVEL:error}
spool_size: 512
idle_timeout: 2s
filebeat.spool_size: 512
filebeat.idle_timeout: 2s

processors:
  - add_cloud_metadata:

fields_under_root: true
fields:
  type: k8s
  program: filebeat_k8s
  hostname: ${FILEBEAT_HOST:${HOSTNAME}}
  k8s_cluster: ${CLUSTER_NAME:unknown}

filebeat.prospectors:
  - input_type: log
    paths:
      - /var/log/containers/*.log
    symlinks: true
    encoding: utf-8
    harvester_buffer_size: 163840
    scan_frequency: 2s
    ignore_older: 12h
    clean_inactive: 24h
    close_inactive: 10m
    json.message_key: log
    json.keys_under_root: true
    json.add_error_key: true
    tail_files: true
    multiline.pattern: '^(!|\s|{|}|#\d+\s|\(|\)|,|\[\d+\]|javax?\.|io\.|Exception|Caused by:|Traceback|Stack trace:)'
    multiline.negate: false
    multiline.match: after
    multiline.timeout: 2s
    backoff: 1s
    max_backoff: 2s
    backoff_factor: 2

output.logstash:
  hosts: [ "${LOGSTASH_HOST}:${LOGSTASH_PORT}" ]
  loadbalance: true
  worker: 5 
  max_retries: 5
  compression_level: 3
  bulk_max_size: 512
  ssl.enabled: true
  ssl.certificate_authorities: ["/etc/ssl/cacert"]
  ssl.certificate: "/etc/ssl/cert"
  ssl.key: "/etc/ssl/key"
  ssl.key_passphrase: ""
  ssl.verification_mode: full
  ssl.supported_protocols: [TLSv1.1, TLSv1.2]
Events:  <none>

(Akshath Patkar) #7

Also i found some thing in logs now

2018/10/24 23:02:53.743802 prospector_log.go:324: INFO File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: /var/log/containers/poc-****-2_default_****-****-71e86e5b4bd6915cb16ee846c369737b99e6c94c20ff343280dff5cde998f26c.log
2018/10/24 23:02:55.744999 prospector_log.go:324: INFO File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: /var/log/containers/poc-****-2_default_****-****-71e86e5b4bd6915cb16ee846c369737b99e6c94c20ff343280dff5cde998f26c.log
2018/10/24 23:02:57.746367 prospector_log.go:324: INFO File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: /var/log/containers/poc-****-2_default_****-****-71e86e5b4bd6915cb16ee846c369737b99e6c94c20ff343280dff5cde998f26c.log
2018/10/24 23:02:59.747627 prospector_log.go:324: INFO File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: /var/log/containers/poc-****-2_default_****-****-71e86e5b4bd6915cb16ee846c369737b99e6c94c20ff343280dff5cde998f26c.log
2018/10/24 23:03:01.749090 prospector_log.go:324: INFO File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: /var/log/containers/poc-****-2_default_****-****-71e86e5b4bd6915cb16ee846c369737b99e6c94c20ff343280dff5cde998f26c.log
2018/10/24 23:03:02.589197 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=3 libbeat.logstash.publish.read_bytes=315 libbeat.logstash.publish.write_bytes=49082 libbeat.logstash.published_and_acked_events=384 libbeat.publisher.published_events=384 publish.events=384 registrar.states.update=384 registrar.writes=3

Does that mean we are falling behind and kubernates keeps rotating logs

Regards,
Akshath


(Steffen Siering) #8

The message says filebeat is still processing a file, whereas it would have been ignored due to the 'ignore_older' setting.

Might be an indicator for filebeat falling behind. Your ignore older setting is 12 hours. no idea when the file is first picked up, though.

Have you checked the number of open file descriptors?


(Akshath Patkar) #9

Yes, i checked the open file descriptors keep increasing.

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5216 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5248 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5632 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5600 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5664 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5664 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5632 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5664 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5696 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5664 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5664 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5696 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5696 0 1535701

[root@filebeat-9g9fr /]# sysctl fs.file-nr

fs.file-nr = 5728 0 1535701

(Steffen Siering) #10

OK, this is a clear sign filebeat lagging behind. The offsets increase in your logs, because the old files are still open. The old logs are treated by filebeat as separate files, that's what the inode is for.

You want to have a look in tuning logstash and Elasticsearch, so to increase throughput.

Switching to a more recent filebeat (event serialization + async publishing 6.x releases helps a little) + properly tuning the queue size and output can also help (your output settings are not as effective as you might think, due to the spool settings in 5.6 limiting the total number of events being forwared to the outputs.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.