Data loss prevention?

Tuckson · January 20, 2021, 9:07pm

Hi,

Unfortunately I am have issues with my platform. This means sometimes my beats cannot send data to my logstash/elasticsearch anymore. They keep trying of course, but this may take an hour or even 2. We are searching for the cause of this, but in the mean time also have another issue.

If you look at this:

you see the nr.of reporting servers every minute. Easy way for me to see if all hosts are sending data.
You also see a big gap.

Now I am wondering, where is my data going if filebeat cannot send (I see errors in the log): I thought after reconnection, filebeat would send the data anyway, but it seems like I keep having that gap.
Here's the default filebeat config I am using:

filebeat.inputs:
- type: log
  enabled: true

  paths:
    - /var/log/server/server.log

  exclude_files: ['\.gz$']

  multiline.pattern: '^ts:'
  multiline.negate: true
  multiline.match: after

  tags: [ "api-log", "apigateway", "asd"]

  ignore_older: 6h
  close_inactive: 5m
  close_removed: true
  clean_removed: true
  clean_inactive: 12h
  scan_frequency: 30s
  harvester_limit: 0

filebeat.config.modules:
  enabled: false

processors:
  - drop_fields:
      fields: ["host"]

fields:
  environment: production

queue.mem:
  events: 4096

output.logstash:
  enabled: true
  hosts: ["server1:5044","server2:5044","server3:5044","server4:5044"]

  loadbalance: true
  timeout: 1m
  slow_start: true
  worker: 4
  bulk_max_size: 4096

logging:
  level: info
  to_files: true
  to_syslog: false
  files:
    path: '/var/log/filebeat'
    name: 'filebeat'
    keepfiles: '3'
    permissions: '0644'
  metrics:
    enabled: false

Anyone knows what I am doing wrong?

BenB196 · January 20, 2021, 9:34pm

I'd suggest you read: https://www.elastic.co/guide/en/beats/filebeat/current/configuring-internal-queue.html. Your configuration only stores the last 4096 events before Filebeat starts dropping events that it can't send. This is why you are probably seeing event/data loss.

warkolm · January 20, 2021, 9:37pm

What errors?

Tuckson · January 21, 2021, 8:44am

failed to publish because of connection reset

Tuckson · January 21, 2021, 8:45am

uhm... I cannot set this to millions I guess?

BenB196 · January 21, 2021, 12:33pm

If you want high retention. I'd suggest using the disk queue instead of memory queue, as it will allow for greater local data retention. (It's in beta, so it's subject to change).

Tuckson · January 21, 2021, 3:44pm

Ehmm... But I guess this will be much and much slower. So If I would liketo have a diskqueue AND performance I suppose I should use a RAMdisk?

BenB196 · January 21, 2021, 3:47pm

I personally try to avoid ramdisks as they can be weird in production environments, generally unless you're generating hundreds or thousands of events per second, even a HDD should be sufficient for using disk queue without much negative impact. (I've never done any benchmarking, just going off of past experience)

system · February 18, 2021, 3:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat doesn't send data to logstash (?) OR logstash isn't receiving it (?) Beats filebeat	7	7041	April 23, 2019
Filebeat lost data Beats filebeat	13	3461	August 28, 2017
Filebeat for logs that send once every couple days Beats filebeat	4	927	December 25, 2019
Filebeat not sending data to Elasticsearch or Logstash Beats filebeat	3	7416	May 2, 2018
Filebeat: tcp xx.xx.xx.xx:5044: i/o timeout Beats filebeat	6	3622	September 9, 2018

Data loss prevention?

Related topics