Parsing JSON format with filebeat

Hi
I am having a little trouble understanding how the parsing of JSON format works, when using filebeat as a collector. I have gone through a few forum posts and docs and can't seem to get things looking right.

Currently the format of the string more or less looks like this:

{"timestamp":"2024-11-13T07:32:51.065840Z","level":"DEBUG","fields":{"message":"🔚 Dropping engine manager"},"filename":"modules/machine_vision/crates/machine_learning/src/engines/yolo.rs"}

There are also a logs where there are a few more keys within fields, but this is the most basic format.

I have done a lot of playing around with the filebeat config, and it currently looks like this:

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}


output.logstash:
   hosts: ["${user.graylog_host}:5044"]
path:
   data: ${sidecar.spoolDir!"/var/lib/graylog-sidecar/collectors/filebeat"}/data
   logs: ${sidecar.spoolDir!"/var/lib/graylog-sidecar/collectors/filebeat"}/log

filebeat.inputs:

- type: filestream
  id: dynamic-file-tracker
  paths:
    - /home/**/logs/**/
  
  parsers:
    - ndjson:
        target: ""
        add_error_key: true
        overwrite_keys: true
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: fields.message
  json.overwrite_keys: true
  

processors:
  - decode_json_fields:
      fields: ["timestamp", "fields", "filename", "level"]
      process_array: false
      max_depth: 2
      target: ""
      overwrite_keys: true
      add_error_key: true
  

close_inactive: 5m              
scan_frequency: 10s 
     

The files are saved as system files and not with the .json extension.

And the outputted fields I get looks like this:

The things I am trying to fix:

  1. The timestamp is coming from the time in which the log is being read, and not coming from the log itself, and I want to be able to replace it
  2. I am trying to replace the 'message' field with the 'field_message' field.
  3. When there are more fields within the "fields" key, then I want them to be disectted as there key only, not having field as a prefix. Eg: plugin field lives within "fields" and is displayed as "fields_plugin" not just plugin (This one I have to fix with pipelines, so the first two are the most important)
  4. I am trying to have this constantly import new logs as new lines are added to the file, to avoid repeated information

Any help in the right direction would be amazing

The fields here are the fields containing json that needs to be decoded. I dont think this processor actually does anything at the moment as I think the ndjson parser on the input is what's doing the json decoding:

- type: filestream
  id: dynamic-file-tracker
  paths:
    - /home/**/logs/**/
  
  parsers:
    - ndjson:
        target: ""
        add_error_key: true
        overwrite_keys: true

So I'd go ahead and remove the decode_json_fields processor as I dont think it's accomplishing anything.

For your specific questions -- your resultant document has both a @timestamp field and a timestamp field. To fix problem 1 and to pull the timestamp into @timestamp you'll use the Timestamp processor:

processors:
  - timestamp:
      field: "timestamp"

To swap field_message into message you'll first use the drop_fields processor and then use the rename processor. You have to use drop_fields first because rename cannot overwrite an existing field:

processors:
  - drop_fields:
      fields: ["message"]
  - rename:
      fields:
        - from: "field_message"
          to: "message"

Now, as I mention later, I dont actually believe that the field is called field_message right now, I think it's actually field.message, I think the underscore is something that graylog is adding, I think graylog might be replacing periods with underscores. So you may also want to try:

processors:
  - drop_fields:
      fields: ["message"]
  - rename:
      fields:
        - from: "field.message"
          to: "message"

for 3 the documents coming out of Beats are dotted so they will look like "fields": {"message": "my message"} , i think the underscores are something that Graylog is doing when it gets the structured object.

You can confirm this by using the console output Configure the Console output | Filebeat Reference [8.16] | Elastic like this:

output.console:
  pretty: true

This will print the parsed message to the screen so you can see what it looks like before it gets processed by Graylog.

I'm not sure I follow number 4 but using a wildcard in the filestream paths will ensure that new files placed on disk that match the glob get picked up.

Hi
Thankyou so much, I was able to get the message fixed which makes it easier to read. For the timestamp part, the '@timestamp' is now being updated to the correct one on the log, but the 'timestamp' field is still showing the time it gets ingested.

It might be confusing to leave it like this, so is there a fix for this. I have tried dropping the field inside the filbeat config, which breaks the config, as well as in a graylog pipeline, neither work.

- drop_fields:
        fields: ["message", "timestamp"]

For part 4 of my previous questions, I am trying to check for if the same file is replaced with one of the same name, but with different content, how do you get filebeat to read only the changes? Or should it be doing it automatically.

Thanks for all your help

Is the drop_fields processor after the timestamp processor? Can you share the full config you're running?

For part 4, by default, Filebeat identifies files based on their inodes and device IDs, not the file path. So if the file is rotated (renamed) and then a new file is started, filebeat should handle this automatically.

If you run into issues with this you can always turn on fingerprinting:

fingerprint:
  enabled: true
  offset: 0
  length: 1024

Instead of relying on the device ID and inode values when comparing files, filebeat hashes the first 1024 bytes of the file and uses that as a file id. When the file gets replaced the first 1024 bytes will hash to a different value and filebeat will treat the replaced file as a new file and read it from the beginning.

Hi

Here is the current config file

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}


output.logstash:
   hosts: ["${user.graylog_host}:5044"]
path:
   data: ${sidecar.spoolDir!"/var/lib/graylog-sidecar/collectors/filebeat"}/data
   logs: ${sidecar.spoolDir!"/var/lib/graylog-sidecar/collectors/filebeat"}/log

filebeat.inputs:

- type: filestream
  id: dynamic-file-tracker
  paths:
    - /home/ias/logs/**/
  
  parsers:
    - ndjson:
        target: ""
        add_error_key: true
        overwrite_keys: true
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: fields.message
  json.overwrite_keys: true

processors:
    - timestamp:
          field: "timestamp"
          layouts:
            - '2006-01-02T15:04:05Z'
            - '2006-01-02T15:04:05.999Z'
            - '2006-01-02T15:04:05.999-07:00'
          test:
            - '2019-06-22T16:33:51Z'
            - '2019-11-18T04:59:51.123Z'
            - '2020-08-03T07:10:20.123456+02:00'
    - drop_fields:
        fields: ["message", "timestamp" ]
    - rename:
        fields:
          - from: "fields.message"
            to: "message"
    - rename:
        fields:
        - from: "collector_node_id"
          to: "source"

close_inactive: 5m              
scan_frequency: 1s 

So its not so much the file will be rotated, it will always have the same name. Just the contents may differ. For example the oldest logs will get deleted and the new ones appended to an identical file, then the old file gets removed and the new one added in its place.

It might seem redundant, but the way our logs will be handled it might be the best option

Yeah, then fingerprint probably makes sense just to be safe.

For the timestamp issue, I would recommend using the console output in Filebeat when testing -- this will make sure you know what is in the document before it gets sent to Graylog.

I think Graylog may be adding the timestamp field when it receives the document and that it's not actually on the document anymore after the drop_fields processor.

Can you run the pipeline with the console output and see if you find the timestamp field in the documents printed to console?