Panic: fatal error: concurrent map iteration and map write

(Maxim Korolyov) #1


We have separate filebeat per running container setup at the moment.
Filebeat writing direct to elastic search.
After upgrading from 5.6.14 to 6.4.2 filebeat version it started to crash after some time of up and running with

fatal error: concurrent map iteration and map write

    goroutine 60 [running]:
    runtime.throw(0x15dee00, 0x26)
           	/usr/local/go/src/runtime/panic.go:616 +0x81 fp=0xc420663640 sp=0xc420663620 pc=0x9b1ab1
           	/usr/local/go/src/runtime/hashmap.go:747 +0x55c fp=0xc4206636d0 sp=0xc420663640 pc=0x98fa7c, 0x1444680, 0xc4203c0660, 0xc4203c0660, 0x160fbc0)
           	/go/src/ +0xea fp=0xc420663778 sp=0xc4206636d0 pc=0xda1aba, 0x1543160, 0xc4203c0660, 0x0, 0x0)
           	/go/src/ +0x15b fp=0xc4206637f0 sp=0xc420663778 pc=0xd9fb4b, 0x1543160, 0xc42047b758, 0x95, 0x1543160, 0xc42047b758)
           	/go/src/ +0x153 fp=0xc4206638a8 sp=0xc4206637f0 pc=0xda2643, 0x14a9d00, 0xc42047b740, 0x99, 0x0, 0x0)
           	/go/src/ +0x86 fp=0xc4206638f8 sp=0xc4206638a8 pc=0xdf5d16, 0x14a9d00, 0xc42047b740, 0x99, 0x0, 0x14a9d00)
           	/go/src/ +0x88 fp=0xc420663950 sp=0xc4206638f8 pc=0xdf5b38, 0x14a9d00, 0xc42047b740, 0x99, 0x0, 0x0)
           	/go/src/ +0x95 fp=0xc420663998 sp=0xc420663950 pc=0xdf5a35, 0x14a9d00, 0xc42047b740, 0x99, 0x99, 0xc42005d000)
           	/go/src/ +0xb9 fp=0xc4206639d8 sp=0xc420663998 pc=0xda7799, 0x14a9d00, 0xc42047b740, 0xc42047b740, 0xc42047b740)
           	/go/src/ +0x1c9 fp=0xc420663a50 sp=0xc4206639d8 pc=0xd9fbb9*Iterator).Fold(0xc420339470, 0x14a9d00, 0xc42047b740, 0xc42047b740, 0x0)
           	/go/src/ +0x41 fp=0xc420663a88 sp=0xc420663a50 pc=0xd9f9c1*jsonEncoder).AddRaw(0xc420332d60, 0x148db00, 0xc42054b200, 0x0, 0x0)
           	/go/src/ +0x300 fp=0xc420663b88 sp=0xc420663a88 pc=0xe23d40*jsonEncoder).Add(0xc420332d60, 0x147c860, 0xc420193c00, 0x148db00, 0xc42054b200, 0xc420193c00, 0x0)
           	/go/src/ +0x8b fp=0xc420663be8 sp=0xc420663b88 pc=0xe23dfb, 0xc420332d60, 0x1684760, 0xc420192240, 0x0, 0xc42054b080, 0x32, 0x63e, 0xc4203b00c0, 0xc42008c3c0, ...)
           	/go/src/ +0x1ac fp=0xc420663ce8 sp=0xc420663be8 pc=0xe1dbcc*Client).publishEvents(0xc4201e6420, 0xc42054b080, 0x32, 0x63e, 0x0, 0x0, 0x0, 0x0, 0x0)
           	/go/src/ +0x14e fp=0xc420663e28 sp=0xc420663ce8 pc=0xe1d35e*Client).Publish(0xc4201e6420, 0x169f080, 0xc42040dc80, 0xc4200a4840, 0xc420663f18)
           	/go/src/ +0x43 fp=0xc420663e90 sp=0xc420663e28 pc=0xe1d173*backoffClient).Publish(0xc420332f20, 0x169f080, 0xc42040dc80, 0x0, 0x0)
           	/go/src/ +0x4b fp=0xc420663ed8 sp=0xc420663e90 pc=0xd6d3fb*netClientWorker).run(0xc420192780)
           	/go/src/ +0x324 fp=0xc420663fd8 sp=0xc420663ed8 pc=0xe8fe54
           	/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc420663fe0 sp=0xc420663fd8 pc=0x9e1891
    created by
           	/go/src/ +0xf0

(Steffen Siering) #2

This looks like a race condition due to some shared event fields being modified while being serialized.

Can you share your configs? Which processors are you using? did you configure fields?

(Maxim Korolyov) #3
    - prospector.type: log
            - "/home/engine/log/app/stderr.log*"
      fields_under_root: true
        host.ip_address: ${LOCAL_IP}
    - prospector.type: log
            - "/home/engine/log/app/stdout.log*"
      close_removed: true
      close_renamed: true
      json.keys_under_root: true
      json.add_error_key: true
      fields_under_root: true
        host.ip_address: ${LOCAL_IP}
    name: ${BEATNAME} "filebeat-6.4.2"
    setup.template.pattern: "filebeat-6.4.2-*"
    setup.template.json.enabled: true "filebeat-6.4.2"
    setup.template.json.path: "filebeat-6.4.2.template.json"
    setup.template.overwrite: false
      hosts: ["http://${ELASTICSEARCH}:9300/"]
      index: "filebeat-%{[beat.version]}-%{+yyyy.MM.dd}"

(Andrew Kroh) #4

This may have been fixed in 6.4.3. From the release notes:

  • Fix race condition when publishing monitoring data. 8646

(Maxim Korolyov) #5
  1. This fix was back ported to older versions.
  2. As far as i understand after digging into stack traces and the code base it is not connected.

(Steffen Siering) #6

This is indeed a bug we haven't seen yet. It is caused by writing to the host namespace via the fields setting. The host namespace is builtin and uses some shared structure that is not always properly protected/copied it seems. The is also shared, and should not be modified. The race occurs when the builtin namespace modifies the provided structure.

Can you please open a bug report with version, configs and stack trace?

There might be some (not so intuitive) workaround: Define a noop global process (e.g. drop a field that can't exist or define a when clause that can never be satisfied):

# define global processor with condition that is always false:
- drop_event.when.and:
    - has_fields: ["random"]
    - not.has_fields: ["random"]

This should force a copy of the "host" namespace per event, such that it can be safely merged with the builtin field.

(Maxim Korolyov) #7

Thanks, created

(Steffen Siering) #8

Thank you.