Filebeat 7.14.0 filestream input field log.offset is character count of the line

Using Filebeat 7.14.0 input: filestream and output.console: pretty: true
I see that the value of log.offset is the size of the event.message and not the offset from the beginning of the file. I used to use input: log and here the log.offset was from the beginning of the file. Please confirm that input filestream is supposed to function this way and different than input log. I ask because the filestream input is touted as a better log file processor than log input. I need the offset from the beginning of the file as a unique identifier of the record.
My filebeat.yml file look like the following:
filebeat.inputs:

  • type: filestream
    paths:
    • /tmp/mikes/*.dat
      file_identity.path: ~
      output.console:
      pretty: true

I have a data file with simple data:
/tmp/mikes/data.dat
123456789012345
12345678901234567890
123456789012345678901234567890

The output shows each line as an event and the offset reflective of the size of each line:
{
"@timestamp": "2021-08-31T18:04:35.580Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.14.0"
},
"input": {
"type": "filestream"
},
"ecs": {
"version": "1.10.0"
},
"host": {
"name": "xxx.dev.xxx.net"
},
"agent": {
"version": "7.14.0",
"hostname": "xxx",
"ephemeral_id": "5761f0a1-279c-403d-971a-23ecc9c3e52c",
"id": "bfa07733-0278-4c9d-a8bc-6ff1255bf6a5",
"name": "xxx.dev.xxx.net",
"type": "filebeat"
},
"environment": [
"remote-filebeat"
],
"log": {
"path": "/tmp/mikes/data.dat",
"offset": 16
},
"message": "123456789012345",
"tags": [
"xxx-log",
"xxx.dev.xxx.net"
]
}
{
"@timestamp": "2021-08-31T18:04:35.580Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.14.0"
},
"log": {
"offset": 21,
"path": "/tmp/mikes/data.dat"
},
"message": "12345678901234567890",
"tags": [
"xxx-log",
"xxx.dev.xxx.net"
],
"input": {
"type": "filestream"
},
"ecs": {
"version": "1.10.0"
},
"host": {
"name": "xxx.dev.xxx.net"
},
"agent": {
"type": "filebeat",
"version": "7.14.0",
"hostname": "xxx",
"ephemeral_id": "5761f0a1-279c-403d-971a-23ecc9c3e52c",
"id": "bfa07733-0278-4c9d-a8bc-6ff1255bf6a5",
"name": "xxx.dev.xxx.net"
},
"environment": [
"remote-filebeat"
]
}
{
"@timestamp": "2021-08-31T18:04:35.581Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.14.0"
},
"host": {
"name": "xxx.dev.xxx.net"
},
"agent": {
"version": "7.14.0",
"hostname": "xxx",
"ephemeral_id": "5761f0a1-279c-403d-971a-23ecc9c3e52c",
"id": "bfa07733-0278-4c9d-a8bc-6ff1255bf6a5",
"name": "xxx.dev.xxx.net",
"type": "filebeat"
},
"environment": [
"remote-filebeat"
],
"message": "123456789012345678901234567890",
"log": {
"offset": 31,
"path": "/tmp/mikes/data.dat"
},
"input": {
"type": "filestream"
},
"ecs": {
"version": "1.10.0"
}
}

The registry file log.json does show the correct offset at the end of the file.
log.json
{"op":"set","id":1}
{"k":"filestream::.global::path::/tmp/mikes/data.dat","v":{"ttl":0,"updated":[281470681743360,18446744011573954816],"cursor":null,"meta":{"source":"/tmp/mikes/data.dat","identifier_name":"path"}}}
{"op":"set","id":2}
{"k":"filestream::.global::path::/tmp/mikes/data.dat","v":{"ttl":1800000000000,"updated":[280444765226302,1630433075],"cursor":null,"meta":{"source":"/tmp/mikes/data.dat","identifier_name":"path"}}}
{"op":"set","id":3}
{"k":"filestream::.global::path::/tmp/mikes/data.dat","v":{"ttl":1800000000000,"updated":[280444765628790,1630433075],"cursor":{"offset":68},"meta":{"source":"/tmp/mikes/data.dat","identifier_name":"path"}}}

You are right! I will open a PR soon with the fix. Thank you for your report and trying the filestream input!

PR: Store offset in log.offset field of events from the filestream input by kvch · Pull Request #27688 · elastic/bea

Thank you for reviewing this and making the correction. I also have another topic related to this regarding the event metadata schema for a difference in input plugins log and filestream. The topic brings to light the difference in schema for the file path which in input log is log.file.path while in input filestream it is log.path. Here is a link to that topic Filebeat input types log and filestream metadata for file path/name differ

Thanks, I will look at that one as well.

1 Like