Filestream multiple file reading issue

BnB · June 4, 2025, 9:49am

So here's my case:
Im using Custom Logs (Filestream) integration on Elasticsearch. The way the files I want it to read is as follows:

There are already multiple files that fit the "Paths" config, so it looks like this

Paths:
/path/to/files/logs_to_read-*.ndjson

/path/to/files/ directory:
logs_to_read-one.ndjson
logs_to_read-two.ndjson
logs_to_read-three.ndjson

Whenever new data is to be ingested, what happens is a new file in the /path/to/files/ directory is created, so it looks like this:
/path/to/files/ directory after new data appears:
logs_to_read-one.ndjson
logs_to_read-two.ndjson
logs_to_read-three.ndjson
logs_to_read-four.ndjson

I want this filestream configuration to:

Read all files that are already in the /path/to/files/ directory
Read any new file that appears in the /path/to/files/ directory

Current Parsers configuration:

- ndjson:
    target: ""
    overwrite_keys: true
    expand_keys: true

Current behavior:

No file is being read by filebeat
There are no errors in filebeat logs, the agents status is Healthy
I would really appreciate any help.

carly.richmond · June 4, 2025, 11:00am

Hi @BnB,

Welcome! Can you share your input configuration for Filebeat and which version you are using? Are you using teh debug options as well to get more information in addition to the logs?

I would expect your filestream input configuration to look a little like this:

filebeat.inputs:
- type: filestream
  id: multi-filestream
  paths:
    - /path/to/files/logs_to_read-*.ndjson

Are you using the prospector option as well to look for files? That may be needed too.

Let us know!

BnB · June 4, 2025, 11:10am

Hi, thank you for your time.

Filestream version: v1.1.3

Input:

/home/new/*.ndjson

In „Edit Custom Logs (Filestream) integration” there is not such option as debug option or prospector or at least I'm not able to locate them, sorry.

carly.richmond · June 4, 2025, 11:12am

The details on how to configure filebeat with the debug level options are here. Those are not specific to the prospector.

Can you share any output you see with the debug option enabled?

BnB · June 4, 2025, 12:28pm

here's the content of todays log file:

gist.github.com

https://gist.github.com/Bmil9696pl/4854235a425378fbc7660e45d3a24896

gistfile1.txt

{"log.level":"info","@timestamp":"2025-06-04T07:22:34.195Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/cmd.runElasticAgent","file.name":"cmd/run.go","file.line":193},"message":"Elastic Agent started","log":{"source":"elastic-agent"},"process.pid":1905,"agent.version":"9.0.2","agent.unprivileged":false,"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.425Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/upgrade.InvokeWatcher","file.name":"upgrade/rollback.go","file.line":154},"message":"Starting upgrade watcher","log":{"source":"elastic-agent"},"path":"/opt/Elastic/Agent/elastic-agent","args":["/opt/Elastic/Agent/elastic-agent","watch","--path.config","/opt/Elastic/Agent","--path.home","/opt/Elastic/Agent"],"env":[],"dir":"","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.426Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/upgrade.InvokeWatcher","file.name":"upgrade/rollback.go","file.line":168},"message":"Upgrade Watcher invoked","log":{"source":"elastic-agent"},"agent.upgrade.watcher.process.pid":50167,"agent.process.pid":1905,"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.437Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/cmd.runElasticAgent","file.name":"cmd/run.go","file.line":282},"message":"APM instrumentation disabled","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.480Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application.New","file.name":"application/application.go","file.line":73},"message":"Gathered system information","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.520Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application.New","file.name":"application/application.go","file.line":79},"message":"Detected available inputs and outputs","log":{"source":"elastic-agent"},"inputs":["gcp-pubsub","mongodb/metrics","stan/metrics","packet","aws-cloudwatch","benchmark","cometd","http_endpoint","o365audit","synthetics/http","mysql/metrics","syslog","awsfargate/metrics","nginx/metrics","cloudfoundry/metrics","gcs","log","netflow","iis/metrics","kafka/metrics","panw/metrics","salesforce","linux/metrics","system/metrics","cloudfoundry","entity-analytics","unifiedlogs","kibana/metrics","syncgateway/metrics","audit/auditd","azure-eventhub","udp","filestream","synthetics/browser","oracle/metrics","memcached/metrics","zookeeper/metrics","audit/file_integrity","container","etw","kubernetes/metrics","redis/metrics","pf-host-agent","azure-blob-storage","kafka","lumberjack","streaming","synthetics/icmp","beat/metrics","mssql/metrics","apache/metrics","journald","openai/metrics","prometheus/metrics","cel","mqtt","redis","unix","docker/metrics","postgresql/metrics","uwsgi/metrics","aws/metrics","azure/metrics","containerd/metrics","activemq/metrics","gcp/metrics","haproxy/metrics","rabbitmq/metrics","statsd/metrics","endpoint","audit/system","aws-s3","tcp","windows/metrics","etcd/metrics","jolokia/metrics","nats/metrics","httpjson","winlog","elasticsearch/metrics","logstash/metrics","vsphere/metrics","synthetics/tcp","http/metrics","meraki/metrics","sql/metrics","traefik/metrics","osquery","docker"],"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.520Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/capabilities.LoadFile","file.name":"capabilities/capabilities.go","file.line":48},"message":"Capabilities file not found in /opt/Elastic/Agent/capabilities.yml","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.520Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application.New","file.name":"application/application.go","file.line":85},"message":"Determined allowed capabilities","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:34.520Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application.New","file.name":"application/application.go","file.line":100},"message":"Loading baseline config from /opt/Elastic/Agent/elastic-agent.yml","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-06-04T07:22:35.422Z","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.NewManager","file.name":"runtime/manager.go","file.line":164},"message":"GRPC comms socket listening at localhost:6789","log":{"source":"elastic-agent"},"address":"localhost:6789","ecs.version":"1.6.0"}

This file has been truncated. show original

note that i deleted lines containing:
CA certificate matching 'ca_trusted_fingerprint' found, adding it to 'certificate_authorities

ca_trusted_fingerprint' set, looking for matching fingerprints

Non-zero metrics in the last 30s

since there were a lot of them, and made the file so big it couldnt be uploaded, and didnt seem to be relevant

carly.richmond · June 5, 2025, 9:56am

Thanks @BnB. Ah it's the Agent Custom Filestream Log integration rather than Filebeat specifically. I've removed the filebeat tag to avoid confusion.

I see some healthy messages on the filestream, and some fleet-server errors (Possible transient error during checkin with fleet-server) that should be recoverable. But it looks to be info level rather than debug so it may be worth changing the setting level.

Can you please also share your full configuration?

BnB · June 5, 2025, 10:22am

sorry for the confusion, here comes the configuration:

gist.github.com

https://gist.github.com/Bmil9696pl/6aeedc694c760bed14b9c524a8aa9c49

gistfile1.txt

######################################
# Fleet configuration
######################################
outputs:
  default:
    type: elasticsearch
    hosts: [127.0.0.1:9200]
    api_key: "example-key"
    #username: "elastic"
    #password: "changeme"

This file has been truncated. show original

BnB · June 10, 2025, 8:08am

I have noticed in my logs that a harvester is started for paths "/var/log/messages* /var/log/syslog* /var/log/system*" and not the path I specified in configuration, is this normal behavior?

BnB · June 11, 2025, 12:00pm

hi, sorry for bothering you again

gist.github.com

https://gist.github.com/Bmil9696pl/012cc6b64d90f068c317703d84708603

gistfile1.txt

  - id: filestream-filestream-6da6cc8c-c099-4839-baa7-2cf250fbae27
    name: filestream-1
    revision: 3
    type: filestream
    use_output: default
    meta:
      package:
        name: filestream
        version: 1.1.3
    data_stream:

This file has been truncated. show original

could these be more helpful/useful?

stephenb · June 11, 2025, 1:14pm

VAR syslog above is most likely because you have the system integration enabled as well. Those are the paths that it will read.

With respect to your configuration, try taking out the json parser it could be failing and you're getting no logs.

Otherwise it looks pretty good at a glance. Are you using the UI to configure this or are you purely doing a self-managed agent with this configuration?

Also, I'm not sure if that path to the files is the actual path or just a sample, but either way you need to make sure that that entire directory tree and files are readable.

We often run into folks that the actual file is readable but the parent directories are not so make sure you check that as well

Also as Carly said the debug logs would help not just the info

If you're using the UI, perhaps a couple screenshots

BnB · June 12, 2025, 9:36am

Hi, thanks for the reply.

deletion of json parser didn't help sadly
the path is a sample, i will check the directory readabilities once i get access to the server wich should be soon enough
Unfortunately these are the only "debug" logs I found in the agent, i found therese something as Log4j thats used for logging in elastic but I'll need a while to configure it so if Log4j is able to help Ill come back once its up and running

image1832×607 35 KB

stephenb · June 12, 2025, 4:01pm

To change the logging level .....

Go to the agent .... And set the log level...
Yeah not obvious

Also to get a better view of the logs Go To Discover

Data View : logs-*
KQL Bar: data_stream.dataset : "elastic_agent.filebeat"
Add the message and log.level fields in the display

BnB · June 13, 2025, 6:52am

Okay, I have done that and the logs it started to show me are:
cannot start ingesting from file "path/to/file/file.ndjson": filesize of "path/to/file/file.ndjson" is 99 bytes, expected at least 1024 bytes for fingerprinting: file size is too small for ingestion

Unfortunately niether turning fingerprinting off nor lowering the fingerprint length helped.

carly.richmond · June 13, 2025, 9:34am

Thanks for sharing @BnB.

The issue is down to an existing filestream limitation where the file needs to be at least 1kB. This is because the default file identity has changed from native to fingerprint in v9, as covered in the release notes. There is already an issue here discussing possible approaches to address this limitation.

There is a way to revert to the 8.x behaviour in the release notes:

To preserve the behaviour from 8.x, set file_identity.native: ~ and prospector.scanner.fingerprint.enabled: false

Can you try setting those options and see if the file is now picked up?

Let us know!

BnB · June 17, 2025, 9:03am

Hi, I've been digging around in the options but i can't find a way to add "file_identity.native: ~" to my configuration, I've looked for the option in GUI but found nothing, ive looked around for the raw .yml file in servers filesystem, andtrued to modify the PUT request made by GUI but none of these offered a way to add this one option.

BnB · June 24, 2025, 12:39pm

So since last time I've dug around and found out two things:

The filestream documentation deems fingerprint to be the default file identity implementation.
The Custom Logs (Filestream) API reference doesn't mention any other file identity implementation except for fingerprint.

So the other file identity options exist but they are not accessible through the Elastic Agents API, while turning fingerpring off is an option it probably does not affect anything since it's the default option (I assume that if no file identity implementation is chosen Elastic just uses the default).

Additionally the filestream documentation mentions that " Changing file_identity is only supported from native or path to fingerprint" so if I had an option to select the file identity I want id need to create the Elastic Agent from scratch.

Since I dont want to create a standalone Filestream agent it seems my only options of resolving this issue are:

Filling my files with filler bytes
Waiting for the Agents API to get uptdated with the missing file identity options
Waiting for the fingerprint file identity implementation to get updated so it can work with files smaller than 1024 bytes

For those who come after

Topic		Replies	Views
Kibana is not showing all the logfiles from the path, shows only one file Beats filebeat	23	7293	July 5, 2017
Filebeat not picking the files as expected Beats filebeat	12	8895	March 1, 2017
Issues with "file" input from logstash to elastic - please read Elasticsearch	7	817	July 6, 2017
Filebeat is reading my logs but they stop to write on their files Beats filebeat	3	1473	July 5, 2017
Filebeat missing files Beats filebeat	19	5697	October 11, 2016

Filestream multiple file reading issue

Related topics