Filebeat pods restarting with "fatal error: concurrent map read and map write"

I have deployed Filebeat -> Logstash -> elasticseach -> kibana in OKD-3.11 running on Openstack Centos-7.6 VMs. Using oss docker images for the deployment.

docker.elastic.co/beats/filebeat-oss:7.9.3
docker.elastic.co/logstash/logstash-oss:7.9.3
docker.elastic.co/elasticsearch/elasticsearch-oss:7.9.3
docker.elastic.co/kibana/kibana-oss:7.9.3

The filebeat pods are restarting with "fatal error: concurrent map read and map write".
And I don't see any issues from the filebeat functionality or performance but just want to know if this is expected.

Some of the observations here are -
The filebeat pod restarts after every time master api & etcd containers restarts.

Filebeat configuration file -

    filebeat.autodiscover:
    providers:
      - type: kubernetes
        node: ${NODE_NAME}
        tags:
          - "kube-logs"
        templates:
          - condition:
              or:
                - contains:
                    kubernetes.pod.name: "ne-mgmt"
                - contains:
                    kubernetes.pod.name: "list-manager"
                - contains:
                    kubernetes.pod.name: "scheduler-mgmt"
                - contains:
                    kubernetes.pod.name: "sync-ne"
                - contains:
                    kubernetes.pod.name: "file-manager"
                - contains:
                    kubernetes.pod.name: "dash-board"
                - contains:
                    kubernetes.pod.name: "ne-db-manager"
                - contains:
                    kubernetes.pod.name: "config-manager"
                - contains:
                    kubernetes.pod.name: "report-manager"
                - contains:
                    kubernetes.pod.name: "clean-backup"
                - contains:
                    kubernetes.pod.name: "warrior"
                - contains:
                    kubernetes.pod.name: "ne-backup"
                - contains:
                    kubernetes.pod.name: "ne-restore"
            config:
              - type: container
                paths:
                  - "/var/log/containers/*-${data.kubernetes.container.id}.log"
  logging.level: debug
  processors:
    - drop_event:
        when.or:
           - equals:
               kubernetes.namespace: "kube-system"
           - equals:
               kubernetes.namespace: "default"
           - equals:
               kubernetes.namespace: "logging"
  output.logstash:
    hosts: ["logstash-service.logging:5044"]
    index: filebeat
    pretty: true
  setup.template.name: "filebeat"
  setup.template.pattern: "filebeat-*"

The error from the logs of filebeat -

fatal error: concurrent map read and map write

goroutine 455764 [running]:
runtime.throw(0x29bcfcc, 0x21)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc01836d330 sp=0xc01836d300 pc=0xedc6b2
runtime.mapaccess2_faststr(0x26e44e0, 0xc0017c1920, 0xc00d4ea170, 0xe, 0xc00e351e90, 0x0)
        /usr/local/go/src/runtime/map_faststr.go:116 +0x47c fp=0xc01836d3a0 sp=0xc01836d330 pc=0xebb04c
github.com/elastic/beats/v7/libbeat/common/kubernetes/k8skeystore.(*KubernetesKeystoresRegistry).GetKeystore(0xc001e53680, 0xc01205f980, 0x28a5aa0, 0xc01836d4b0)
        /go/src/github.com/elastic/beats/libbeat/common/kubernetes/k8skeystore/kubernetes_keystore.go:79 +0x110 fp=0xc01836d478 sp=0xc01836d3a0 pc=0x2375330
github.com/elastic/beats/v7/libbeat/autodiscover/template.Mapper.GetConfig(0xc000010ed8, 0x1, 0x1, 0x2cf31e0, 0xc00004ed80, 0x2caf8e0, 0xc001e53680, 0xc01205f980, 0xc000541500, 0x7f68ad5767d0, ...)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/template/config.go:95 +0x413 fp=0xc01836d550 sp=0xc01836d478 pc=0x17009e3
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*Provider).publish(0xc000d0c2c0, 0xc01205f980)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go:143 +0xae fp=0xc01836d620 sp=0xc01836d550 pc=0x237739e
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*Provider).publish-fm(0xc01205f980)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go:141 +0x34 fp=0xc01836d640 sp=0xc01836d620 pc=0x237fe74
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).emitEvents(0xc00060bf80, 0xc014ad0400, 0x29855ef, 0x4, 0xc006642160, 0x1, 0x1, 0xc000f44580, 0x1, 0x1)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:428 +0xa82 fp=0xc01836df50 sp=0xc01836d640 pc=0x237bf82
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).emit(0xc00060bf80, 0xc014ad0400, 0x29855ef, 0x4)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:270 +0x95 fp=0xc01836dfb0 sp=0xc01836df50 pc=0x237b475
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).OnUpdate.func1()
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:142 +0x48 fp=0xc01836dfe0 sp=0xc01836dfb0 pc=0x237fc48
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc01836dfe8 sp=0xc01836dfe0 pc=0xf102c1
created by time.goFunc
        /usr/local/go/src/time/sleep.go:168 +0x44

The log actually contains around 20 lakhs of lines so just pasted a part of it.
Partial log - The Go Playground

Could you please post the stack trace here? We will verify if the issue has been fixed and you should update the filebeat to a recent version.

I have deployed Filebeat -> Logstash -> elasticseach -> kibana in OKD-3.11 running on Openstack Centos-7.6 VMs. Using oss docker images for the deployment.

docker.elastic.co/beats/filebeat-oss:7.9.3
docker.elastic.co/logstash/logstash-oss:7.9.3
docker.elastic.co/elasticsearch/elasticsearch-oss:7.9.3
docker.elastic.co/kibana/kibana-oss:7.9.3

Please let me know if you need any more details.

Thanks for the information, but I was thinking about more details around fatal error: concurrent map read and map write.

@mtojek Updated the query will the filebeat configuration and error log from filebeat pods. The log is of huge size around 80 MB so added only error part of it.

Please copy just the part of it which is similar to: The Go Playground (click Run).

Copied thousand lines around the error and updated in the query.
https://play.golang.org/p/AD0jCZGJqKB

Thanks, this is the part I was looking for:

fatal error: concurrent map read and map write

goroutine 55933 [running]:
runtime.throw(0x29bcfcc, 0x21)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc004797330 sp=0xc004797300 pc=0xedc6b2
runtime.mapaccess2_faststr(0x26e44e0, 0xc0009ccc30, 0xc003c50d90, 0xe, 0xc003d37480, 0x0)
        /usr/local/go/src/runtime/map_faststr.go:116 +0x47c fp=0xc0047973a0 sp=0xc004797330 pc=0xebb04c
github.com/elastic/beats/v7/libbeat/common/kubernetes/k8skeystore.(*KubernetesKeystoresRegistry).GetKeystore(0xc0009cea40, 0xc0057bf710, 0x28a5aa0, 0xc002f7b458)
        /go/src/github.com/elastic/beats/libbeat/common/kubernetes/k8skeystore/kubernetes_keystore.go:79 +0x110 fp=0xc004797478 sp=0xc0047973a0 pc=0x2375330
github.com/elastic/beats/v7/libbeat/autodiscover/template.Mapper.GetConfig(0xc000010438, 0x1, 0x1, 0x2cf31e0, 0xc00004f880, 0x2caf8e0, 0xc0009cea40, 0xc0057bf710, 0xc000ab6380, 0x7fe7606dd560, ...)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/template/config.go:95 +0x413 fp=0xc004797550 sp=0xc004797478 pc=0x17009e3
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*Provider).publish(0xc000566160, 0xc0057bf710)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go:143 +0xae fp=0xc004797620 sp=0xc004797550 pc=0x237739e
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*Provider).publish-fm(0xc0057bf710)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go:141 +0x34 fp=0xc004797640 sp=0xc004797620 pc=0x237fe74
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).emitEvents(0xc00053ee00, 0xc0023e2400, 0x29855ef, 0x4, 0xc001bc74a0, 0x1, 0x1, 0xc000a01b80, 0x1, 0x1)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:428 +0xa82 fp=0xc004797f50 sp=0xc004797640 pc=0x237bf82
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).emit(0xc00053ee00, 0xc0023e2400, 0x29855ef, 0x4)
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:270 +0x95 fp=0xc004797fb0 sp=0xc004797f50 pc=0x237b475
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).OnDelete.func1()
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:174 +0x58 fp=0xc004797fe0 sp=0xc004797fb0 pc=0x237fd18
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc004797fe8 sp=0xc004797fe0 pc=0xf102c1
created by time.goFunc
        /usr/local/go/src/time/sleep.go:168 +0x44

It seems that the issue has been fixed in newer version: [Filebeat] Panic in K8s autodiscover · Issue #21843 · elastic/beats · GitHub

Could you please use a newer filebeat (at least 7.10.0)?

Sure I will give a try and let you know. Thanks for the information.

The filebeat pods are still restarting with the following error even after updating the elk and filebeat to v7.10.2.

2021-03-06T11:46:39.829Z        DEBUG   [input] input/input.go:139      Run input
2021-03-06T11:46:39.829Z        DEBUG   [input] log/input.go:205        Start next scan
2021-03-06T11:46:39.830Z        DEBUG   [input] log/input.go:439        Check file for harvesting: /var/log/containers/1615031071454-ne-backup-1615031071454-dz52x_task-execution_tes-job-ctr-ffaaen22-1615031073017-572e5c521cc95667c7a75a5903e13b12f5c8c6cdbe18e42005602986eaee8c07.log
2021-03-06T11:46:39.830Z        DEBUG   [input] log/input.go:530        Update existing file for harvesting: /var/log/containers/1615031071454-ne-backup-1615031071454-dz52x_task-execution_tes-job-ctr-ffaaen22-1615031073017-572e5c521cc95667c7a75a5903e13b12f5c8c6cdbe18e42005602986eaee8c07.log, offset: 1115
2021-03-06T11:46:39.830Z        DEBUG   [input] log/input.go:582        Harvester for file is still running: /var/log/containers/1615031071454-ne-backup-1615031071454-dz52x_task-execution_tes-job-ctr-ffaaen22-1615031073017-572e5c521cc95667c7a75a5903e13b12f5c8c6cdbe18e42005602986eaee8c07.log
2021-03-06T11:46:39.830Z        DEBUG   [input] log/input.go:226        input states cleaned up. Before: 1, After: 1, Pending: 0
panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod

goroutine 71680 [running]:
github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes.(*pod).OnDelete.func1()
        /go/src/github.com/elastic/beats/libbeat/autodiscover/providers/kubernetes/pod.go:174 +0x7c
created by time.goFunc
        /usr/local/go/src/time/sleep.go:168 +0x44

This contains only a portion of the filebeat log - filebeat_DeletedFinalStateUnknown - 2e44ff3d

Please let me know if this is due to any resource limitations. I have actually configured 4Gb RAM & 4 cores CPU for each filebeat pod.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.