Filebeat not considering registry and resending data

hi,

I am running filebeat on a pod in kubernetes. Filebeat has its registry on an external volume, on a CEPHFS cluster, so I dont lose memory of files being sent after each reboot.
Some how this is happening and it looks filebeat is resending data after a container crashed see attached image where the UniqueID line it's what I would expect and the dark green show a lot of duplicates.

This is my filebeat config:

...
filebeat.inputs:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- type: log
  ignore_older: 72h
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  exclude_lines: ["^#"]
  clean_removed: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/zoom-dashboard/zoom-*pasticipants.log*
  fields:  
    document_type: zparticipants 

- type: log
  ignore_older: 72h
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  exclude_lines: ["^#"]
  clean_removed: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/zoom-dashboard/zoom-*live.log*
  fields:  
    document_type: aggrzeventslive

- type: log
  ignore_older: 72h
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  exclude_lines: ["^#"]
  clean_removed: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/zoom-dashboard/zoom-*past.log*
  fields:  
    document_type: zeventspast


#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["XXXXX:5084"]
...

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/zoom-dashboard
  name: filebeat
  keepfiles: 7
  permissions: 0644

#========================= Filebeat global options ============================

filebeat.registry.path: /var/log/zoom-dashboard/registry

/var/log/zoom-dashboard is a mount to an external volume.

Just to clarify further I provide the kubernetes cluster description:

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: manila-cephfs-share
provisioner: manila-provisioner
parameters:
  type: "Meyrin CephFS"
  zones: nova
  osSecretName: os-trustee
  osSecretNamespace: kube-system
  protocol: CEPHFS
  backend: csi-cephfs
  csi-driver: cephfs.csi.ceph.com
  osShareID: b9XXX94bf-XXX-4dcc-XXX-0daf8XXXXXca
  osShareAccessID: 03XXXX87-cf4e-41XX-XbX2-b0aaXXXXX2e
reclaimPolicy: Retain
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: manila-cephfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100G
  storageClassName: manila-cephfs-share
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zoom-dashboard
spec:
  selector:
    matchLabels:
      app: zoom-dashboard
  replicas: 1
  template:
    metadata:
      labels:
        app: zoom-dashboard
    spec:
      securityContext:
        runAsUser: 0
      containers:
      - name: filebeats
        image: docker.elastic.co/beats/filebeat:7.10.0
        command: ["filebeat", "-e", "-strict.perms=false", "-c", "/etc/zoom-dashboard/filebeat.yml"]
        volumeMounts:
          - mountPath: /etc/grid-security
            name: etc-grid-security
          - mountPath: /var/log/zoom-dashboard
            name: zoomdata
          - mountPath: /etc/zoom-dashboard/filebeat.yml
            subPath: filebeat.yml
            name: filebeat-config
            readOnly: true  
          - mountPath: /etc/grid-security/Grid_CA_certificate.pem
            subPath: Grid_CA_certificate.pem
            name: gridcacert
            readOnly: true 
      - name: zoom-dashboard
        image: gitlab-registry.zzz.zz/videoconference/zoom-dashboard:latest
        command: ["/app/dispatch_collectors.sh"]
        volumeMounts:
          - mountPath: /var/log/zoom-dashboard
            name: zoomdata
          - mountPath: /app/config.py
            subPath: config.py
            name: config-py-config
            readOnly: true
      volumes: 
      - name: etc-grid-security
        hostPath:
          path: /etc/grid-security
      - name: zoomdata
        persistentVolumeClaim:
          claimName: manila-cephfs-pvc
          readOnly: false
      - name: config-py-config
        configMap:
          name: config-py-config
          items:
          - key: config.py
            path: config.py
      - name: filebeat-config
        configMap:
          name: filebeat-config
          items:
            - key: filebeat.yml
              path: filebeat.yml 
      - name: gridcacert
        configMap:
          name: gridcacert
          items:
            - key: Grid_CA_certificate.pem
              path: Grid_CA_certificate.pem
      imagePullSecrets:
      - name: regcred

Thank you for your time, any hint is welcome.

hi again,

I wanted to add, that while checking the registry content, I can see that the files are well there e.g.:

root@zoom-dashboard-55b8b45b6b-vdhc9:/var/log/zoom-dashboard/registry/filebeat# cat 11846401.json | grep 02-26 | grep pasticipants | grep meetin
{"_key":"filebeat::logs::native::1101008176881-102","source":"/var/log/zoom-dashboard/zoom-meetings-pasticipants.log.2021-02-26","timestamp":[45157259,1614571316],"ttl":-1,"id":"native::1101008176881-102","prev_id":"","FileStateOS":{"inode":1101008176881,"device":102},"identifier_name":"native","offset":14373249,"type":"log"},
{"_key":"filebeat::logs::native::1101008176881-1048642","prev_id":"","source":"/var/log/zoom-dashboard/zoom-meetings-pasticipants.log.2021-02-26","ttl":-1,"type":"log","FileStateOS":{"inode":1101008176881,"device":1048642},"id":"native::1101008176881-1048642","offset":14373249,"timestamp":[602466073,1614571658],"identifier_name":"native"},
{"_key":"filebeat::logs::native::1101008176881-1048653","offset":14373249,"ttl":-1,"type":"log","prev_id":"","source":"/var/log/zoom-dashboard/zoom-meetings-pasticipants.log.2021-02-26","FileStateOS":{"inode":1101008176881,"device":1048653},"identifier_name":"native","id":"native::1101008176881-1048653","timestamp":[169562591,1614571317]},

Dont understand all params, but the device changes, despite being the same inode (file). Not sure it that plays a role. My CEPH sysadmin, commented to me that this attribute has not meaning for him.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.