Filebeat+pipeline+es log duplication Help!

When I use the following configuration of filebeat+pipeline to collect nginx logs, multiple duplicate logs will be generated in Elasticsearch, but I cannot find the reason, despite testing many times. Could you please help me see what caused the problem
my-filebeat.yml

filebeat.inputs:
- type: filestream
  paths:
  - /root/rsync_logs/nginx_logs/gate.access.log
  tags: ["gate","nginx"]
processors:
- drop_fields:
    fields:
    - agent.ephemeral_id
    - agent.hostname
    - agent.id
    - agent.type
    - agent.version
    - ecs.version
    - input.type
    - log.offset
    - version

output.elasticsearch:
  hosts: ["192.168.112.44:30200"]
  protocol: "https"
  username: "elastic"
  password: "XXXXXXXXX"
  ssl.verification_mode: none
  timezone: "Asia/Shanghai"
  enabled: true
  pipelines:
  - pipeline: "biot_nginx_log_pipeline"
    when.contains:
      tags: "nginx"
  indices:
  - index: "biot-log-nginx-gate-%{+yyyy.MM.dd}"
    when.contains:
      tags: "gate"
setup.ilm.enabled: false
setup.template.name: "biot-log"
setup.template.pattern: "biot-log*"
setup.template.overwrite: true
setup.template.settings:
  index.number_of_shards: 3
  index.number_of_replicas: 0

I can find that multiple duplicate logs are identical from the es, but my original log is only one, and the logs are still increasing repeatedly


This is my pipeline

[
  {
    "grok": {
      "field": "message",
      "patterns": [
        "%{WORD:client_address}:%{IP:client_address}###%{WORD:client_user}:%{DATA:client_user}###%{WORD:real_ip}:%{DATA:real_ip}###%{WORD:visit_time}:%{TIMESTAMP_ISO8601:visit_time}###%{WORD:request_uri}:%{GREEDYDATA:request_uri}###%{WORD:request_host}:%{DATA:request_host}###%{WORD:http_status}:%{NUMBER:http_status}###%{WORD:upstream_status}:%{DATA:upstream_status}###%{WORD:traffic}:%{NUMBER:traffic}###%{WORD:original_address}:%{DATA:original_address}###%{WORD:http_user_agent}:%{GREEDYDATA:http_user_agent}###%{WORD:request_length}:%{NUMBER:request_length}###%{WORD:load_balancer}:%{DATA:load_balancer}###%{WORD:processing_time}:%{NUMBER:processing_time}###%{WORD:upstream_response_time}:%{DATA:upstream_response_time}###%{WORD:http_acl_t}:%{DATA:http_acl_t}###"
      ],
      "ignore_failure": true
    }
  },
  {
    "date": {
      "field": "visit_time",
      "target_field": "@timestamp",
      "formats": [
        "ISO8601"
      ],
      "timezone": "Asia/Shanghai"
    }
  },
  {
    "geoip": {
      "field": "client_address",
      "target_field": "geo",
      "ignore_failure": true
    }
  }
]

How are you putting logs in this folder?

Its name, rsync_logs, suggests that you are using rsync to copy files from this folder.

If I'm not wrong, when using rsync it will make a backup of the current file on the destination, transfer it from the source, and then remove the backup file.

The main issue is that this will change the inode of the file and filebeat uses the inode to track the files, if the name is the same, but the inode is different, filebeat will consider this a new file and it may lead to duplication.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.