Filebeat lost data

ogauchard · June 23, 2017, 12:03pm

Hi all,

We have experienced a lost data issue with filebeat > logstash.

Our Logstash server was down for 4 hours and some events have not been indexed.

The filebeat configuration is as follows :

scan_frequency: 30s
ignore_older: 10m
close_eof: true
close_removed: true
clean_removed: true
clean_inactive: 15m

The indexed files are read only once because they are never updated once created.

I have identified some files that have not been indexed but I do not see them in filebeat logs (whereas I can see all other files).

What is the exact retry policy of filebeat ?

Can our configuration lead to data loss ?

Thank you.

steffens · June 23, 2017, 12:49pm

filebeat uses send-at-least once semantics. Have you check the files state in the registry?

A file might not be picked up if it has been deleted via log-rotation and filebeat is restart/started thereafter, or filebeat has closed a file and can not pick it up anymore, due to ignore_older.

ogauchard · June 23, 2017, 1:12pm

What do you mean by send-at-least once sementics ?

The "missing" files are not in registry.
It seems that filebeat did not see these files, no trace in log file nor in the registry.

Could it be due to broken communication with logstash ?

steffens · June 26, 2017, 10:43am

What do you mean by send-at-least once sementics ?

That is, on failure (e.g. missing ACK from LS), filebeat will retry -> harvesters will be blocked, due to buffers in filebeat being filled up.

It seems that filebeat did not see these files, no trace in log file nor in the registry.

Could it be due to broken communication with logstash ?

Maybe you want to share the full configuration, logs, registry file with us? Given the information I have so far, I'd assume it's due to ignore_older plus clean_inactive. The clean_inactive removes entries from the registry. Due to ignore_older, these old files are not picked up again...

ruflin · June 26, 2017, 10:46am

What version of filebeat are we talking about?

ogauchard · June 26, 2017, 4:01pm

So when filebeat is in retry mode, harvesters end to be blocked due to buffer size.

And if this situation lasts long (longer than clean_inactive, which should be longer than ignore_older), new files can be simply ignored.

Is that a good summary of the situation?

@ruflin I am using filebeat 5.4.1

steffens · June 27, 2017, 8:22am

Not sure it's that simple, as filebeat behavior also depends on some other settings in your configuration file. Please share filebeat configuration.

ogauchard · June 27, 2017, 8:52am

Here is the configuration, but there is nothing more than what I have previously shared :

filebeat.prospectors:

- input_type: log
  paths:
    - D:\_data\aaa\bbb\*.csv
  encoding: utf-8
  document_type: foo
  scan_frequency: 30s
  ignore_older: 10m
  close_eof: true
  close_removed: true
  clean_removed: true
  clean_inactive: 15m

output.logstash:
  hosts: ["logstash-val:5044"]

logging.level: info
logging.to_files: true
logging.files:
  path: D:\logs
  name: filebeat
  rotateeverybytes: 10485760 # = 10MB
  keepfiles: 10

ruflin · June 30, 2017, 8:46am

Could you try to set close_removed to false?

ogauchard · June 30, 2017, 2:05pm

The problem is I cannot easily test this scenario again.
It occurred on a production environment while our logstash server was down for patching activity.
It is a hard task to reproduce the environment in validation stage.
Why do you think that close_removed can change something ? Missing files have not been removed.
I think @steffens explanations are correct. Harvesters have been blocked for too long time and as ignore_older is quite short (10'), new files have been ignored.

ruflin · July 27, 2017, 11:30am

Sorry for the really late reply this somehow slipped through the cracks. You are right in case the files are not removed close_removed would not have an affect. As you haven't set a harvester_limit I would expect the harvester still pick up the new files preventing from applying ignore_older. I wonder now if close_inactivecould apply, close the file and the clean_inactive can happen.

I would be really interesting to see the log file when this happens as this would allow to see the steps and logics that filebeat applied.

ogauchard · July 27, 2017, 2:57pm

No problem ! Thank you for answering.
As I said, it is not an easy thing to simulate this behavior again. There is no patching activity scheduled on our logstash server (and I cannot stop it ! ) so just have to wait...
I have change the ignore_older config to 2h as I think this parameter comes into play in this issue. But maybe I'm wrong...

If the issue happens again, I will send you the logs !

ruflin · July 31, 2017, 4:24am

@oguachard Thanks a lot, appreciate it.

system · August 28, 2017, 4:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat for logs that send once every couple days Beats filebeat	4	927	December 25, 2019
Filebeat beta1 resends random data upon every restart (registry file not updated properly?) Beats filebeat	14	2246	October 20, 2016
Data loss prevention? Beats filebeat	8	1449	February 18, 2021
Trying to understand why filebeat dropped events during logstash failure Beats filebeat	5	1356	October 10, 2018
Filebeats not rescanning if Elasticsearch index deleted Beats	4	2296	July 5, 2017

Filebeat lost data

Related topics