Filebeat not considering registry and resending data

hi,

I am running filebeat on a pod in kubernetes. Filebeat has its registry on an external volume, on a CEPHFS cluster, so I dont lose memory of files being sent after each reboot.
Some how this is happening and it looks filebeat is resending data after a container crashed see attached image where the UniqueID line it's what I would expect and the dark green show a lot of duplicates.

This is my filebeat config:

...
filebeat.inputs:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- type: log
  ignore_older: 72h
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  exclude_lines: ["^#"]
  clean_removed: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/zoom-dashboard/zoom-*pasticipants.log*
  fields:  
    document_type: zparticipants 

- type: log
  ignore_older: 72h
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  exclude_lines: ["^#"]
  clean_removed: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/zoom-dashboard/zoom-*live.log*
  fields:  
    document_type: aggrzeventslive

- type: log
  ignore_older: 72h
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  exclude_lines: ["^#"]
  clean_removed: false
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/zoom-dashboard/zoom-*past.log*
  fields:  
    document_type: zeventspast


#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["XXXXX:5084"]
...

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/zoom-dashboard
  name: filebeat
  keepfiles: 7
  permissions: 0644

#========================= Filebeat global options ============================

filebeat.registry.path: /var/log/zoom-dashboard/registry

/var/log/zoom-dashboard is a mount to an external volume.

Just to clarify further I provide the kubernetes cluster description:

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: manila-cephfs-share
provisioner: manila-provisioner
parameters:
  type: "Meyrin CephFS"
  zones: nova
  osSecretName: os-trustee
  osSecretNamespace: kube-system
  protocol: CEPHFS
  backend: csi-cephfs
  csi-driver: cephfs.csi.ceph.com
  osShareID: b9XXX94bf-XXX-4dcc-XXX-0daf8XXXXXca
  osShareAccessID: 03XXXX87-cf4e-41XX-XbX2-b0aaXXXXX2e
reclaimPolicy: Retain
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: manila-cephfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100G
  storageClassName: manila-cephfs-share
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zoom-dashboard
spec:
  selector:
    matchLabels:
      app: zoom-dashboard
  replicas: 1
  template:
    metadata:
      labels:
        app: zoom-dashboard
    spec:
      securityContext:
        runAsUser: 0
      containers:
      - name: filebeats
        image: docker.elastic.co/beats/filebeat:7.10.0
        command: ["filebeat", "-e", "-strict.perms=false", "-c", "/etc/zoom-dashboard/filebeat.yml"]
        volumeMounts:
          - mountPath: /etc/grid-security
            name: etc-grid-security
          - mountPath: /var/log/zoom-dashboard
            name: zoomdata
          - mountPath: /etc/zoom-dashboard/filebeat.yml
            subPath: filebeat.yml
            name: filebeat-config
            readOnly: true  
          - mountPath: /etc/grid-security/Grid_CA_certificate.pem
            subPath: Grid_CA_certificate.pem
            name: gridcacert
            readOnly: true 
      - name: zoom-dashboard
        image: gitlab-registry.zzz.zz/videoconference/zoom-dashboard:latest
        command: ["/app/dispatch_collectors.sh"]
        volumeMounts:
          - mountPath: /var/log/zoom-dashboard
            name: zoomdata
          - mountPath: /app/config.py
            subPath: config.py
            name: config-py-config
            readOnly: true
      volumes: 
      - name: etc-grid-security
        hostPath:
          path: /etc/grid-security
      - name: zoomdata
        persistentVolumeClaim:
          claimName: manila-cephfs-pvc
          readOnly: false
      - name: config-py-config
        configMap:
          name: config-py-config
          items:
          - key: config.py
            path: config.py
      - name: filebeat-config
        configMap:
          name: filebeat-config
          items:
            - key: filebeat.yml
              path: filebeat.yml 
      - name: gridcacert
        configMap:
          name: gridcacert
          items:
            - key: Grid_CA_certificate.pem
              path: Grid_CA_certificate.pem
      imagePullSecrets:
      - name: regcred

Thank you for your time, any hint is welcome.

hi again,

I wanted to add, that while checking the registry content, I can see that the files are well there e.g.:

root@zoom-dashboard-55b8b45b6b-vdhc9:/var/log/zoom-dashboard/registry/filebeat# cat 11846401.json | grep 02-26 | grep pasticipants | grep meetin
{"_key":"filebeat::logs::native::1101008176881-102","source":"/var/log/zoom-dashboard/zoom-meetings-pasticipants.log.2021-02-26","timestamp":[45157259,1614571316],"ttl":-1,"id":"native::1101008176881-102","prev_id":"","FileStateOS":{"inode":1101008176881,"device":102},"identifier_name":"native","offset":14373249,"type":"log"},
{"_key":"filebeat::logs::native::1101008176881-1048642","prev_id":"","source":"/var/log/zoom-dashboard/zoom-meetings-pasticipants.log.2021-02-26","ttl":-1,"type":"log","FileStateOS":{"inode":1101008176881,"device":1048642},"id":"native::1101008176881-1048642","offset":14373249,"timestamp":[602466073,1614571658],"identifier_name":"native"},
{"_key":"filebeat::logs::native::1101008176881-1048653","offset":14373249,"ttl":-1,"type":"log","prev_id":"","source":"/var/log/zoom-dashboard/zoom-meetings-pasticipants.log.2021-02-26","FileStateOS":{"inode":1101008176881,"device":1048653},"identifier_name":"native","id":"native::1101008176881-1048653","timestamp":[169562591,1614571317]},

Dont understand all params, but the device changes, despite being the same inode (file). Not sure it that plays a role. My CEPH sysadmin, commented to me that this attribute has not meaning for him.

Thank you