New Feature: reading chunks of data in harvester

Task:
I want to use Filebeat for reading and processing log files, which consist of fixed size structures. In other words, each consecutive 120 bytes in that kind of files represent new chunk of data.

I want to read them and slice into a fields using processors.

Idea:
I want to develop a new reader Chunk and add it to harvester's chain of readers:

limit -> (multiline -> timeout) -> strip_newline -> json -> encode -> (line XOR chunk) -> log_file

This reader will yield new chunks of fixed size and forward them to further steps.


What do you think of this idea, is it the right approach to solve initial task? Does anyone else need capability to read fixed structures from log files?

  1. Which service are you trying to monitor

  2. Is this 'fixed' chunk all ASCII, or is some binary in there as well.

I was hoping to - one day - make the reader chain configurable. We don't want full parsing support, but chunking and different line splitting/multiline strategies could be implemented and reused in filebeat modules more easily.

Which service are you trying to monitor

These are SAP Security Audit files.

Is this 'fixed' chunk all ASCII, or is some binary in there as well.

It's a UTF16-encoded file. Each 200 characters in that file represents a structure, the first 20 bytes of which looks like that:

As you can see, this is text, but not ASCII.

Do you think, it's better not to try to invent the wheel and just wait for this feature?

TBH, I don't think we're going to work on this feature anytime soon.

It's only the second time ever I'm seeing this use-case.

Contributions are very welcome.

Are these files being rotated? I wonder if it would make sense to define a special prospector type instead of modifying the reader chain.

This topic was automatically closed after 21 days. New replies are no longer allowed.