Filestream multiple file reading issue

So here's my case:
Im using Custom Logs (Filestream) integration on Elasticsearch. The way the files I want it to read is as follows:

  1. There are already multiple files that fit the "Paths" config, so it looks like this

Paths:
/path/to/files/logs_to_read-*.ndjson

/path/to/files/ directory:
logs_to_read-one.ndjson
logs_to_read-two.ndjson
logs_to_read-three.ndjson

  1. Whenever new data is to be ingested, what happens is a new file in the /path/to/files/ directory is created, so it looks like this:
    /path/to/files/ directory after new data appears:
    logs_to_read-one.ndjson
    logs_to_read-two.ndjson
    logs_to_read-three.ndjson
    logs_to_read-four.ndjson

I want this filestream configuration to:

  1. Read all files that are already in the /path/to/files/ directory
  2. Read any new file that appears in the /path/to/files/ directory

Current Parsers configuration:

- ndjson:
    target: ""
    overwrite_keys: true
    expand_keys: true

Current behavior:

  1. No file is being read by filebeat
  2. There are no errors in filebeat logs, the agents status is Healthy
    I would really appreciate any help.

Hi @BnB,

Welcome! Can you share your input configuration for Filebeat and which version you are using? Are you using teh debug options as well to get more information in addition to the logs?

I would expect your filestream input configuration to look a little like this:

filebeat.inputs:
- type: filestream
  id: multi-filestream
  paths:
    - /path/to/files/logs_to_read-*.ndjson

Are you using the prospector option as well to look for files? That may be needed too.

Let us know!

Hi, thank you for your time.

Filestream version: v1.1.3

Input:

/home/new/*.ndjson

In „Edit Custom Logs (Filestream) integration” there is not such option as debug option or prospector or at least I'm not able to locate them, sorry.

The details on how to configure filebeat with the debug level options are here. Those are not specific to the prospector.

Can you share any output you see with the debug option enabled?

here's the content of todays log file:

note that i deleted lines containing:
CA certificate matching 'ca_trusted_fingerprint' found, adding it to 'certificate_authorities

ca_trusted_fingerprint' set, looking for matching fingerprints

Non-zero metrics in the last 30s

since there were a lot of them, and made the file so big it couldnt be uploaded, and didnt seem to be relevant

Thanks @BnB. Ah it's the Agent Custom Filestream Log integration rather than Filebeat specifically. I've removed the filebeat tag to avoid confusion.

I see some healthy messages on the filestream, and some fleet-server errors (Possible transient error during checkin with fleet-server) that should be recoverable. But it looks to be info level rather than debug so it may be worth changing the setting level.

Can you please also share your full configuration?

sorry for the confusion, here comes the configuration:

image
I have noticed in my logs that a harvester is started for paths "/var/log/messages* /var/log/syslog* /var/log/system*" and not the path I specified in configuration, is this normal behavior?

hi, sorry for bothering you again

could these be more helpful/useful?

VAR syslog above is most likely because you have the system integration enabled as well. Those are the paths that it will read.

With respect to your configuration, try taking out the json parser it could be failing and you're getting no logs.

Otherwise it looks pretty good at a glance. Are you using the UI to configure this or are you purely doing a self-managed agent with this configuration?

Also, I'm not sure if that path to the files is the actual path or just a sample, but either way you need to make sure that that entire directory tree and files are readable.

We often run into folks that the actual file is readable but the parent directories are not so make sure you check that as well

Also as Carly said the debug logs would help not just the info

If you're using the UI, perhaps a couple screenshots

Hi, thanks for the reply.

  1. deletion of json parser didn't help sadly
  2. the path is a sample, i will check the directory readabilities once i get access to the server wich should be soon enough
  3. Unfortunately these are the only "debug" logs I found in the agent, i found therese something as Log4j thats used for logging in elastic but I'll need a while to configure it so if Log4j is able to help Ill come back once its up and running

To change the logging level .....

Go to the agent .... And set the log level...
Yeah not obvious

Also to get a better view of the logs Go To Discover

Data View : logs-*
KQL Bar: data_stream.dataset : "elastic_agent.filebeat"
Add the message and log.level fields in the display

Okay, I have done that and the logs it started to show me are:
cannot start ingesting from file "path/to/file/file.ndjson": filesize of "path/to/file/file.ndjson" is 99 bytes, expected at least 1024 bytes for fingerprinting: file size is too small for ingestion

Unfortunately niether turning fingerprinting off nor lowering the fingerprint length helped.

Thanks for sharing @BnB.

The issue is down to an existing filestream limitation where the file needs to be at least 1kB. This is because the default file identity has changed from native to fingerprint in v9, as covered in the release notes. There is already an issue here discussing possible approaches to address this limitation.

There is a way to revert to the 8.x behaviour in the release notes:

To preserve the behaviour from 8.x, set file_identity.native: ~ and prospector.scanner.fingerprint.enabled: false

Can you try setting those options and see if the file is now picked up?

Let us know!

Hi, I've been digging around in the options but i can't find a way to add "file_identity.native: ~" to my configuration, I've looked for the option in GUI but found nothing, ive looked around for the raw .yml file in servers filesystem, andtrued to modify the PUT request made by GUI but none of these offered a way to add this one option.

So since last time I've dug around and found out two things:

  1. The filestream documentation deems fingerprint to be the default file identity implementation.
  2. The Custom Logs (Filestream) API reference doesn't mention any other file identity implementation except for fingerprint.

So the other file identity options exist but they are not accessible through the Elastic Agents API, while turning fingerpring off is an option it probably does not affect anything since it's the default option (I assume that if no file identity implementation is chosen Elastic just uses the default).

Additionally the filestream documentation mentions that " Changing file_identity is only supported from native or path to fingerprint" so if I had an option to select the file identity I want id need to create the Elastic Agent from scratch.

Since I dont want to create a standalone Filestream agent it seems my only options of resolving this issue are:

  1. Filling my files with filler bytes
  2. Waiting for the Agents API to get uptdated with the missing file identity options
  3. Waiting for the fingerprint file identity implementation to get updated so it can work with files smaller than 1024 bytes

For those who come after