File was truncated. Begin reading file from offset 0 multiple time

Hi,
I'm new to the ELK stack and currently exploring its features for log aggregation.

We are using tomcat-9 and elk stack 8.5.1 and we faced an issue that the logs are duplicated in Kibana more then 3 or 5 times depends on crawling the log file, this is our log4j config :

		<Appender filePattern="@project.home@/logs/mimou.%d{yyyy-MM-dd}.json.log" ignoreExceptions="false" name="JSON_FILE" type="RollingFile">
			<JSONLayout compact="true" eventEol="true" properties="true" stacktraceAsString="true" includeTimeMillis="true">
				<KeyValuePair key="timestamp" value="$${date:yyyy-MM-dd'T'HH:mm:ss.SSSZ}" />
			</JSONLayout>
            
			<TimeBasedTriggeringPolicy />

			<DirectWriteRolloverStrategy />
		</Appender>

and our filebeat config :

filebeat.inputs:
- type: log
  paths:
  - '/opt/liferay/logs/mimou*.json.log'
  json.keys_under_root: false
  json.add_error_key: true
  json.overwrite_keys: true
  json.message_key: messages
#  close_inactive: 10m
#  clean_inactive: 25h
#  ignore_older: 24h
  fields:
      environment: ${ENVIRONMENT}
      stage: ${STAGE}
      cluster: ${CLUSTER}
  name: filebeat
  tags: ["${ENVIRONMENT}"]
  multiline.pattern: '^[[:space:]]+|^Caused by:'
  multiline.negate: false
  multiline.match: after
logging.to_stderr: true
output.logstash:
  enabled: true
  hosts: ["XX.XXXXXXXXX"]
  ssl.enabled: false
  ssl.certificate_authorities: ["./certs/XXXXXXXXX"]
  ssl.certificate: "./certs/XXXXXXXXX.crt"
  ssl.key: "./certs/secrets/XXXXXXXXX.key"

and the filebeat logs is showing this multiple time :

{"log.level":"info","@timestamp":"2024-12-04T09:02:43.982Z","log.logger":"input.harvester","log.origin":{"file.name":"log/harvester.go","file.line":329},"message":"File was truncated. Begin reading file from offset 0.","service.name":"filebeat","input_id":"4e1d9019-83bb-477d-a4e1-1aa6424721d0","source_file":"/opt/system/logs/moumou.2024-12-04.json.log","state_id":"native::14411563612684419072-1048735","finished":false,"os_id":"14411563612684419072-1048735","harvester_id":"23b2693c-947b-4788-a1b7-17804cf56c19","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-12-04T09:02:50.989Z","log.logger":"input.harvester","log.origin":{"file.name":"log/harvester.go","file.line":310},"message":"Harvester started for paths: [/opt/system/logs/moumou*.json.log]","service.name":"filebeat","input_id":"4e1d9019-83bb-477d-a4e1-1aa6424721d0","source_file":"/opt/system/logs/moumou.2024-12-04.json.log","state_id":"native::14411563612684419072-1048735","finished":false,"os_id":"14411563612684419072-1048735","old_source":"/opt/system/logs/moumou.2024-12-04.json.log","old_finished":true,"old_os_id":"14411563612684419072-1048735","harvester_id":"8477a7cb-8230-4298-bf91-fe2edddd5134","ecs.version":"1.6.0"}

the log file is considered as new file each time we got new line and it's crawled from index 0.

Anyone encountered this before?

Thanks in advance

Hi @Mimouz,

this is a common problem with some log rotation approaches as detailed in Log rotation results in lost or duplicate events | Filebeat Reference [8.16] | Elastic. Maybe there is way to configure whatever tool performs the rotation on your system to do so without file truncation?

1 Like

Also, you are using the deprecated log input, please switch your input to using filestream

1 Like

unfortunately the problem is not resolved using file stream:

{......"message":"File was truncated as offset (4245) > size (3040): /opt/mimouz/logs/MY_LOG_FILE_NAME.json.log".......}

{....... "message":"File was truncated. Begin reading file from offset 0. Path=/opt/mimouz/logs/MY_LOG_FILE_NAME.json.log",.........}

the log file has the same inode each time, and we think that the issue is maybe due to that the log files are mounted in a files share.

Please share your updated configuration that is relying on filestream and please share the details of how you have mounted the network share (the mount settings used)

First of all, Happy New Year! :tada: I hope you had a wonderful start to 2025.

Apologies for the delayed reply.

this is my filebeat.yml file

filebeat.inputs:
- type: filestream
  id: liferay_logs
  paths:
    - /opt/liferay/logs/liferay.json.log
  fields:
      environment: ${ENVIRONMENT}
      stage: ${STAGE}
      cluster: ${CLUSTER}
      app: "liferay"  
  parsers:
    - ndjson:
      # Decode JSON options. Enable this if your logs are structured in JSON.
      # JSON key on which to apply the line filtering and multiline settings. This key
      # must be top level and its value must be a string, otherwise it is ignored. If
      # no text key is defined, the line filtering and multiline features cannot be used.
      message_key: message

      # By default, the decoded JSON is placed under a "json" key in the output document.
      # If you enable this setting, the keys are copied to the top level of the output document.
      keys_under_root: false

      # If keys_under_root and this setting are enabled, then the values from the decoded
      # JSON object overwrite the fields that Filebeat normally adds (type, source, offset, etc.)
      # in case of conflicts.
      overwrite_keys: true

      # If this setting is enabled, Filebeat adds an "error.message" and "error.key: json" key in case of JSON
      # unmarshaling errors or when a text key is defined in the configuration but cannot
      # be used.
      add_error_key: true
processors:
  - decode_json_fields:
      fields: ["json.log"]
      process_array: true
      max_depth: 5
      target: "payload"
      overwrite_keys: true
  - rename:
      fields:
        - from: "json.log"
          to: "payload.message"      
        - from: "file"
          to: "file.name"
        - from: "class"
          to: "log.origin.file.name"
        - from: "method"
          to: "log.origin.function"
      fail_on_error: false
      ignore_missing: true
  - copy_fields:
      fields:
        - from: level
          to: log.level
        - from: fields.app
          to: event.module
        - from: fields.app
          to: service.type
        - from: fields.environment
          to: host.environment
        - from: fields.stage
          to: host.type
      fail_on_error: false
      ignore_missing: true
           
logging.level: debug
logging.to_stderr: false
logging.metrics.enabled: false

output.logstash:
  enabled: true
  hosts: ["XX.XX.XX.XX:5044"]
  ssl.enabled: false
  ssl.certificate_authorities: ["./certs/secrets/certificate_authority/ca/ca.crt"]
  ssl.certificate: "./certs/secrets/certificates/filebeat/filebeat.crt"
  ssl.key: "./certs/secrets/certificates/filebeat/filebeat.key"

and regarding the azure file Share is mounted as Volume in my helm chart :

      volumes:
        - name: liferay-data
          azureFile:
            secretName: azure-storage-account-{{ .Values.azure.storage.account.name }}-secret
            shareName: myapp-{{ .Values.stage }}-liferay-data
            readOnly: false

Thanks in advance for your support and let me know if you'd like to add anything further!