Is it possible to detect when filebeat is finished harvesting a file

We are in the process of designing a system in AWS that automatically creates and stands up a new instance of an application server and after a configured amount of time tear it down and rebuild a new one. We're doing that to make sure that the system has the latest updated information available.

To that end we want to have filebeat installed on these new 'ephemeral' instances writing application log information to a back end Elasticsearch environment but we need to make sure that filebeat has finished processing any log files on the source server before completing the tear down.

Looking in the documentation I see settings for queueing and flushing but not seeing anything related to a finished or acknowledge signal that gets generated and sent out letting the filebeat service know that it is safe to shutdown or that it is safe to tear the machine down.

Is this possible and what would be the best approach?

Thanks,
Bill

Hi Bill, thanks for your post. I would recommend looking into the filebeat registry which has information on the state of files that filebeat has processed.

https://www.elastic.co/guide/en/beats/filebeat/current/configuration-general-options.html

Let us know if you have further questions.

filebeats has a verbose mode and in that verbose mode you can see if it is done harvesting files.

Could these logs be sent to elastic and build out a dashboard to actively see the progress of a sent log(s)? Would be neat to see progress realtime, or updated progress of injected data from filebeats.

@thedude
I mean technically yes, but this will be like logging the system that is doing logging.

You could do this if you write the filebeat logs of itself, to a file for it to read in and somehow through logstash get those send to a separate index. (THIS WOULD BE A LOT OF WORK AND VERY OVERKILL)

here is what I recommend from easy to hard.

(easy)
Start Filebeats on your EC2 instance, and check the filebeat logs by hand in verbose mode and see how log it takes. FileBeat tends to be super fast at sending data out.

(harder)
have your application write its logs to an S3 bucket. Then using a Lambda function to read that data from the s3 bucket and have it get sent to logstash for parsing/ingestion. Then have some sort of S3 life cycle policy set on the bucket to roll off old app logs.

Logtash has a plugin to talk to S3 (I am pretty sure)

image

Thanks for all the great input.

The team that is writing the tear down of the EC2 instances doesn't want to have to monitor the filebeat log file itself to see if it's alright to tear down the instance ideally they said that if they could write something like a 'hearbeat' to detect when the queue is flushed that would be perfect. I told them that I wasn't sure that filebeat had a such a functionality. The process needs to be automatic without any manual intervention.

Another idea that is being tossed around is the idea of a file shipper or appender that would write the application log files out to a staging area that would be monitored by filebeat and then that would be ingested into Elasticsearch. Another idea from someone was to use Kafka for this.

Opinions / ideas?

Thanks,
Bill

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.