I want to read an append-format Blob file with filebeat

syo04suke26 · February 27, 2025, 9:15am

[background]
I want to configure a data pipeline that uses filebeat to collect log data output to Azure Storage and output it to Logstash.
Blob files are append-based and are logged approximately every minute. The log output specifications depend on the product specifications and cannot be changed.

[problem]
When testing with the above configuration, Filebeat reads the append blob from the beginning every time it detects an append, resulting in duplicate logs being output to Logstash.

[question]
Is it possible to use filebeat to output only the appended portion of a Blob file in Append format?

[my config]

filebeat.inputs:
- type: azure-blob-storage
  enabled: true
  account_name: "xxx"
  auth.shared_credentials.account_key: "xxx"
  containers:
  - name: "yyy"
    file_selectors:
    - regex: 'zzz'
    poll: true
    poll_interval: 60s

strawgate · March 1, 2025, 6:02pm

We do not have specific support for append blobs.

To workaround this, I guess you could download the file from azure blob storage at some interval and replace the previous local copy, filebeat will track its location in the file and each time you replace it, it will start from the previous offset and read until the end. This will cause you to download the same file many times and the azure command line tools do not support delta downloads of append blobs afaik.

Alternatively, may be able to setup a notification and use an Azure function to read the append blob starting at the offset, push the events to Event Hub and then use the Event Hub input to send them to Elasticsearch.

If you're able to change the application:

If the application eventually rotates to a new log file after appending for a period of time, you could focus your input just on the rotated log files and avoid reading the appended files.
Append blobs are optimized for continuous appending, one write operation per minute isn't a high throughput append use-case and could just be to a new blob file each time. I don't believe Azure pricing is any different between these two scenarios.

Finally, as this is a missing feature in Beats, I would recommend making an issue in the Integrations repository for the azure_blob_storage integration here: GitHub · Where software is built and an issue in the Beats repo for the azure-blob-storage input here: GitHub · Where software is built and have them link to eachother.

exdghost · April 9, 2025, 6:37am

@syo04suke26 The reason why we do not support detecting appends in blobs is because of registry size and scalability. In order to support appends, filebeat needs to keep track of each blob and their related offset so it can resume from that point when it sees the same file come up during the scheduling process. This can cause the registry size to explode when dealing with millions of blobs at scale. We need to keep the registry as lean as possible for performance reasons.

Topic		Replies	Views
Azure append Blobs Beats filebeat	3	299	April 9, 2025
Reading append blob from azure storage account using logstash-input-azure_blob_storage Logstash	1	483	May 30, 2023
How to prevent old log appending existing log in elasticsearch Logstash	8	1072	February 7, 2019
Entire log is read when it changes Beats filebeat	5	410	July 31, 2018
How to read one specific file out of multiple files from BLOB storage Logstash	4	668	July 27, 2018

I want to read an append-format Blob file with filebeat

Related topics