Backfilling old logs with filebeat

Hi there,

I have a question. I am trying to backfill some old logs into our ELK stack. In order to do that, I created a simple configuration file like this :

filebeat:
  prospectors:
    -
      paths:
        - "-"
      input_type: stdin
      document_type: nginx
      fields_under_root: true
      fields:
        environment: staging
output:
  logstash:
    hosts: ["example.com:5044"]

I then zcat my old log file and pipe the result into filebeat telling it to use this configuration file.

The command I am using looks like this :

sudo zcat /var/log/nginx/access.log-20160101.gz | filebeat -e -v -c /etc/filebeat-stdin-nginx.yml

My issue is that filebeat keeps running once it reached the end of the input, waiting for new lines on stdin I suppose. Is there a way to tell it to stop ?

I want to write some scripts to perform the backfilling. This behavior makes it hard to write them. Maybe I should backfill logs another way.

Thanks for your awesome work.

Instead of changing Filebeat to shut down when the write end of a stdin pipe is closed perhaps reading gzipped files should be supported out of the box. Then you wouldn't have to use zcat and the processing of a file would be resumable.

Yes, indeed it might be difficult to create scripts to perform the backfilling as Filebeat doesn't stop. I think it would make more sense to add another input type in Filebeat to backfill some gzip log files. Could you please add a feature request under GitHub?

FYI, I created a Github Issue right here https://github.com/elastic/beats/issues/637

1 Like

I'm going to assume this enhancement won't be done for a while though. I should probably come up with another work around in the meantime because I need to maintain the names of the files, which seems to be a problem.

Your zcat idea works great though. I'll definitely be using something similar.

EDIT: Do you know if theres a way to make plugins or something for FileBeat?

Thanks for making this post, it was really helpful for related and unrelated learning.

I'm doing some experimental work to educate myself around this stack. I scp'ed some server log folders down to my machine and have an elk vagrant box. Filebeat is installed on the host machine, and is forwarding the logs to the vagrant box.

I am trying to pump in the exact same files again and again after deleting the logstash indices in elastic. I was pretty sure it worked the first few times I tried it but I don't know for sure. I simply deleted the registry file and restarted the filebeat service.

Should this work? Should deleting the registry file allow me to backfill the logs that are already sent? Must I really cat/zcat and pipe to the filebeat binary?

Should deleting the registry file allow me to backfill the logs that are already sent?

Yes. Shutting down Filebeat before deleting the file is probably a good idea.

Absolutely. I have tried this for hours. It did work, then it stopped.

The registry file is written to when the service is shut down, so I wait until that happens before I remove it. However, somehow, filebeat still knows what it has already forwarded when i start it again.

I have tried stopping the service and simply running the binary in order to simplify the experiment. The effect is the same.

Is there definitely no other way for filebeat to know what is sent other than the registry file?

EDIT: A caveman workaround of course is to simply copy all the logs into another folder, update the filebeat config and restart the service. However, the registry being deleted hasn't helped me with this issue.

EDIT2: I ended up reinstalling filebeat and am now getting expected behaviour again. i.e. I delete registry and push all the logs im again. Doesn't make any sense so I guess I must have messed something up while debugging.

Thanks for the help anyway :slight_smile:

filebeat has an option ignore_older set to 24h by default. If file is too old, it will not be processed by filebeat. You can try the nightlies, which set ignore_older to infinite by default and introduce close_older for closing unchanged files.

You can run filebeat with debug output any time: -v -d '*'

1 Like

absolutely. thanks for the reply. i saw the older than setting but managed to get it all going after copying the logs to a different location and removing registry. Also, weirdly had to reinstall filebeat.

I was tailing the logs on both ends, filebeat and logstash, and watching the registry folder with 'watch ls'.

This could mean a number of things so post of little value to the forum. If it happens again I will try to be more scientific in my approach.

you can just use touch to update the file timestamps. After deleting registry the complete file should be processed.

For my tests I have a script basically doing

#!/bin/sh

touch ~/tmp/test/logs/*
rm -f .filebeat
./filebeat -e -v -c $1
3 Likes

nice tips. thanks!