Filebeat resent all events of a logfile - every day

Hi

The logfile has always the same name, but a housekeeping job copy and zip the file every night, with the side effect that the file gets a new inode. I guess that the inode is harvesters primary key in the registry and filebeat assumes a new file - don't care about the same name.
tail_files: true in filebeat.yml doesn't help, because it is a new file for filebeat.

A test for this effect is when you change a logfile for test cases with vi. If you save the file with :wq it gets a new i node. (stat before and start after saving). Elastic shows all entries of the file, but if you use the pipe echo 'errormessage' >> filename, only the last error will send.

Has anybody an idea to solve the problem?

Cheers,
Heinz

Take a look at file_identity property: https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-log.html

Ok, this seems to be the solution (from the discryption point of view). I put much time in it - with no solution.
For my case I have exact path/filenames without wildcards, so file_identity.path should be the way. I tried every possible configuration regarding the file_identity.path property, but it doesn't work. One new entry with vi and filebeat sents all again.
There is no documentation about the value of path. I found just one example:
file_identity.path: ~
I tried
file_identity.path:
file_identity.path: ~
file_identity.path: true
file_identity.path: 'path/filename'

but... nothing works for me. What means ~ ?

Can you post the whole config you're using? The property file_identity.path: ~ should be fine.

This is the relevant section:

filebeat.inputs:

- type: log

  enabled: true
  paths:
    - /disk00/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
    - /disk00/app/oracle/diag/rdbms/aitdev01/AITDEV01/trace/alert_AITDEV01.log
    - /disk00/app/oracle/diag/rdbms/aitdev02/AITDEV02/trace/alert_AITDEV02.log
    - /disk00/app/oracle/diag/rdbms/aitma02/AITMA02/trace/alert_AITMA02.log
    - /disk00/app/oracle/diag/rdbms/aitprd01/AITPRD01/trace/alert_AITPRD01.log
    - /disk00/app/oracle/diag/rdbms/aitprd02/AITPRD02/trace/alert_AITPRD02.log
    - /disk00/app/oracle/diag/rdbms/cusprd01/CUSPRD01/trace/alert_CUSPRD01.log
    - /disk00/app/oracle/diag/rdbms/cusprd03/CUSPRD03/trace/alert_CUSPRD03.log
    - /disk00/app/oracle/diag/rdbms/cusprd05/CUSPRD05/trace/alert_CUSPRD05.log
    - /disk00/app/oracle/diag/rdbms/cusqa02/CUSQA02/trace/alert_CUSQA02.log
    - /disk00/app/oracle/diag/rdbms/gfadev/GFADEV/trace/alert_GFADEV.log
    - /disk00/app/oracle/diag/rdbms/gfaprd/GFAPRD/trace/alert_GFAPRD.log
    - /disk00/app/oracle/diag/rdbms/hipadb/HIPADB/trace/alert_HIPADB.log
    - /disk00/app/oracle/diag/rdbms/prodb/PRODB/trace/alert_PRODB.log
    - /home/delphe/dummy.log
  
  #tail_files: true
  file_identity.path: ~

  include_lines: ['ORA-']                               


  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}'      # Regex for the pattern which matches the beginning of the interesting logentry

  multiline.negate: true                               
  multiline.match: after                               


  # Example of an error log entry:
  # 2020-04-19T06:00:00.497793+02:00
  # Errors in file /disk00/app/oracle/diag/rdbms/aitdev01/AITDEV01/trace/AITDEV01_j003_143635.trc:
  # ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_4628"
  # ORA-38153: Software edition is incompatible with SQL plan management.
  # ORA-06512: at "SYS.DBMS_SPM_INTERNAL", line 6202
  # ORA-06512: at "SYS.DBMS_SPM", line 2806
  # ORA-06512: at line 34
  # 2020-04-19T06:00:19.115017+02:00                     # Timestamp i.e. begin of the next error. The last entry will not appear since the logentry is the last one.

  processors:
  - copy_fields:
      fields:
        - from: message
          to: oracle.message
      fail_on_error: false
      ignore_missing: true

  - truncate_fields:
      fields:
      - message
      max_characters: 33 
      fail_on_error: false
      ignore_missing: true

  - timestamp:
      field: message
      layouts:
        - '2020-04-19T06:00:00.497793+02:00'
      target_field: oracle.timestamp


  # Maybe not the best solution, but current state of my knowledge. More add_fields processor insctances to write the ORACLE-Instance in a new field called "oracle.instance.name" 
  - add_fields:
      when:
         contains:
            log.file.path: "ASM1"
      target: oracle.instance
      fields: 
        name: "ASM1"

  - add_fields:
      when:
         contains:
            log.file.path: "AITDEV01"
      target: oracle.instance
      fields: 
        name: "AITDEV01"

  - add_fields:
      when:
         contains:
            log.file.path: "AITDEV02"
      target: oracle.instance
      fields: 
        name: "AITDEV02"

  - add_fields:
      when:
         contains:
            log.file.path: "AITMA02"
      target: oracle.instance
      fields: 
        name: "AITMA02"

  - add_fields:
      when:
         contains:
            log.file.path: "AITPRD01"
      target: oracle.instance
      fields: 
        name: "AITPRD01"

  - add_fields:
      when:
         contains:
            log.file.path: "AITPRD02"
      target: oracle.instance
      fields: 
        name: "AITPRD02"

  - add_fields:
      when:
         contains:
            log.file.path: "CUSPRD01"
      target: oracle.instance
      fields: 
        name: "CUSPRD01"

  - add_fields:
      when:
         contains:
            log.file.path: "CUSPRD05"
      target: oracle.instance
      fields: 
        name: "CUSPRD05"

  - add_fields:
      when:
         contains:
            log.file.path: "CUSPRD03" 
      target: oracle.instance
      fields: 
        name: "CUSPRD03"

  - add_fields:
      when:
         contains:
            log.file.path: "CUSQA02"
      target: oracle.instance
      fields: 
        name: "CUSQA02"       

  - add_fields:
      when:
         contains:
            log.file.path: "GFAPRD"
      target: oracle.instance
      fields: 
        name: "GFAPRD"       

  - add_fields:
      when:
         contains:
            log.file.path: "PRODB"
      target: oracle.instance
      fields: 
        name: "PRODB"       

  - add_fields:
      when:
         contains:
            log.file.path: "HIPADB"
      target: oracle.instance
      fields: 
        name: "HIPADB"       

  - add_fields:
      when:
         contains:
            log.file.path: "GFADEV"
      target: oracle.instance
      fields: 
        name: "GFADEV"       

  # dummy scans the file /home/delphe/dummy.log for test cases
  - add_fields:
      when:
         contains:
            log.file.path: "dummy"
      target: oracle.instance
      fields: 
        name: "dummy"       

  



#============================= Filebeat modules ===============================

filebeat.config.modules:
....

Thanks,
Heinz

One more thing. Which version of filebeat are you using? As the file_identity is really fresh and I believe it's currently available on master and 7.x branched. It didn't become a part of any release yet.

Hi Marcin

We use 7.6.2

Cheers,
Heinz

You can try to build the latest master and verify if that version works for you.

Have you tried to exclude the copied files from harvesting with exclude_files option?

Thanks for the hint, but I use path/filenames without wildcards and the ziped files are in a complete different directory.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.