Filebeat 32-bit crash in Windows environment

(Allen) #1

It seems the filebeat I installed in Window environment keep crashing before send any log files into logstash.

currently I am using filebeat version 1.1.1 and later tested with 1.2.3 however the results are the same.

Here is the my config file:

  # List of prospectors to fetch data.
        - "//test/temp/Spark/27/*.log"
        - "//test/temp/Spark/28/*.log"
      input_type: log
      encoding: utf-8
        pattern: "Time of Log: [[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}"
        negate: true
        match: after
        max_lines: 8000 
      exclude_lines: ["\\*\\*Event:"] 
      timeout: 60s
      document_type: spark_client_log
      exclude_files: [".TXT$", ".XML$", ".msi$", ".exe$"]
      scan_frequency: 15s
      backoff: 30s

      max_backoff: 200s
      ignore_older: 240h 
  spool_size: 50
  registry_file: "C:/ProgramData/filebeat/sparkclientregistry"
  idle_timeout: 50s


  ### Logstash as output
    # The Logstash hosts
    hosts: ["localhost:5044"]

And output from my debug window captured this:

**Event: Long Running Process
2016/08/16 01:15:34.011052 reader.go:138: DBG  End of file reached: \\lfvicprdw033\temp\Spark\27\Spark_LNSP0092_Nick Pir
piris_Network Timeout_201608131154219303.log; Backoff now.
2016/08/16 01:15:34.016053 prospector.go:364: DBG  Launching harvester on new file: \\server033\temp\Spark\27\M
P0138_tester_Long Running Process_201608090909547751.log
2016/08/16 01:15:34.017033 prospector.go:259: DBG  Check file for harvesting: \\server033\temp\Spark\27\MP013
8_tester_Long Running Process_201608090913499135.log
fatal error: runtime: failed to release pages
2016/08/16 01:15:34.018033 prospector.go:315: DBG  Start harvesting unknown file: \\lfvicprdw033\temp\Spark\27\Long Running Process_201608090913499135.log

runtime stack:
runtime.throw(0x86e080, 0x20)
        /usr/local/go/src/runtime/panic.go:527 +0x7f
runtime.sysFree(0x0, 0x10000000, 0xf5fdb0)
        /usr/local/go/src/runtime/mem_windows.go:86 +0x5b
runtime.mHeap_SysAlloc(0xac5e60, 0x100000, 0xf5fee8)
        /usr/local/go/src/runtime/malloc.go:418 +0x4a1
runtime.mHeap_Grow(0xac5e60, 0x8, 0x0)
        /usr/local/go/src/runtime/mheap.go:628 +0x52
runtime.mHeap_AllocSpanLocked(0xac5e60, 0x1, 0x43199364)
        /usr/local/go/src/runtime/mheap.go:532 +0x574
runtime.mHeap_Alloc_m(0xac5e60, 0x1, 0x1f, 0x127c4500, 0x43199364)
        /usr/local/go/src/runtime/mheap.go:425 +0x281
        /usr/local/go/src/runtime/mheap.go:484 +0x3d
        /usr/local/go/src/runtime/asm_386.s:283 +0x81
runtime.mHeap_Alloc(0xac5e60, 0x1, 0x1f, 0x27f0100, 0xc4f5)
        /usr/local/go/src/runtime/mheap.go:485 +0x5a
runtime.mCentral_Grow(0xacbe90, 0x0)
        /usr/local/go/src/runtime/mcentral.go:190 +0x8e
runtime.mCentral_CacheSpan(0xacbe90, 0x62e417f8)
        /usr/local/go/src/runtime/mcentral.go:86 +0x439
runtime.mCache_Refill(0x1904a8, 0x1f, 0x3)
        /usr/local/go/src/runtime/mcache.go:118 +0xae
        /usr/local/go/src/runtime/malloc.go:614 +0x2b
        /usr/local/go/src/runtime/asm_386.s:267 +0x57

goroutine 14640 [running]:
        /usr/local/go/src/runtime/asm_386.s:222 fp=0x20945cb8 sp=0x20945cb4
runtime.mallocgc(0x400, 0x73d900, 0x1, 0x400)
        /usr/local/go/src/runtime/malloc.go:615 +0x65a fp=0x20945d20 sp=0x20945cb8
runtime.newarray(0x73d900, 0x400, 0x400)
        /usr/local/go/src/runtime/malloc.go:780 +0xbc fp=0x20945d44 sp=0x20945d20
runtime.makeslice(0x732b40, 0x400, 0x0, 0x400, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/slice.go:32 +0x180 fp=0x20945d84 sp=0x20945d44*LineReader).decode(0x1ad66e80, 0x70, 0x37bf34e0, 0x0, 0x0)
        /go/src/ +0x5f fp=0x20945dfc sp=0x20945d84*LineReader).advance(0x1ad66e80, 0x0, 0x0)
        /go/src/ +0x9a fp=0x20945e98 sp=0x20945dfc*LineReader).Next(0x1ad66e80, 0x0, 0x0, 0x0, 0x13, 0x0, 0x0)

Could anyone please give me some hints for how to fix it.


(Steffen Siering) #2

panic happens when allocating memory from OS fails. I think we've seen this on 32bit versions.

The sysReserve call already fails returning a NULL-pointer. Checking the golang runtime code a NULL-pointer check has been introduced in go1.6. But even with newer go runtime version this still would fail, as runtime fails to reserve some virtual address space from OS.

Can you run filebeat 1.2.3 with -e -v -d "*" and post the log output here? I wonder if this will look different. Please use </>-button to format logs, config files and code to make it somewhat readable.

(Allen) #3

Thanks Steffens.

I run filebeat 1.2.3 this morning and here is the result of crashes(only a few lines at the beginning of the crashes.)

2016/08/16 22:36:17.441300 crawler.go:71: DBG  Registrar will re-save state for \\lfvicprdw033\temp\Spark\27\Network Timeout_201608121041347390.log
2016/08/16 22:36:17.438300 log.go:283: DBG  harvest: "\\\\lfvicprdw033\\temp\\Spark\\27\\Lon
g Running Process_201608091111403338.log" (offset snapshot:0)
2016/08/16 22:36:17.442300 util.go:20: DBG  full line read
2016/08/16 22:36:17.443300 log.go:197: DBG  Drop line as it does match one of the exclude patterns

**Event: Memory Usage
2016/08/16 22:36:17.443300 log.go:113: INFO Harvester started for file: \\lfvicprdw033\temp\Spark\27\Long Running Process_201608091111403338.log
2016/08/16 22:36:17.444300 prospector.go:399: DBG  Launching harvester on new file: \\lfvicprdw033\temp\Spark\27\Spark_L
NSP0303_Michael Theoharis_Network Timeout_201608121241599192.log
2016/08/16 22:36:17.444300 prospector.go:345: DBG  Start harvesting unknown file: \\lfvicprdw033\temp\Spark\27\Spark_LNS
P0321_Sarah Huon_Saving Local Context Error_201608091621422113.log
fatal error: runtime: cannot map pages in arena address space2016/08/16 22:36:17.446301 crawler.go:71: DBG  Registrar wi
ll re-save state for \\lfvicprdw033\temp\Spark\27\Spark_LNSP0303_Michael Theoharis_Network Timeout_201608121241599192.lo

runtime stack:
runtime.throw(0x893440, 0x30)
        /usr/local/go/src/runtime/panic.go:527 +0x7f
runtime.sysMap(0x0, 0x100000, 0x4583fd01, 0xadcf78)
        /usr/local/go/src/runtime/mem_windows.go:112 +0x78
runtime.mHeap_SysAlloc(0xacae60, 0x100000, 0x0)
        /usr/local/go/src/runtime/malloc.go:423 +0x121
runtime.mHeap_Grow(0xacae60, 0x8, 0x0)
        /usr/local/go/src/runtime/mheap.go:628 +0x52
runtime.mHeap_AllocSpanLocked(0xacae60, 0x1, 0x69b97234)
        /usr/local/go/src/runtime/mheap.go:532 +0x574
runtime.mHeap_Alloc_m(0xacae60, 0x1, 0x1f, 0x74989800, 0x69b97234)
        /usr/local/go/src/runtime/mheap.go:425 +0x281
        /usr/local/go/src/runtime/mheap.go:484 +0x3d
        /usr/local/go/src/runtime/asm_386.s:283 +0x81
runtime.mHeap_Alloc(0xacae60, 0x1, 0x1f, 0x27f0100, 0xbc46)
        /usr/local/go/src/runtime/mheap.go:485 +0x5a
runtime.mCentral_Grow(0xad0e90, 0x0)
        /usr/local/go/src/runtime/mcentral.go:190 +0x8e
runtime.mCentral_CacheSpan(0xad0e90, 0x689619b8)
        /usr/local/go/src/runtime/mcentral.go:86 +0x439
runtime.mCache_Refill(0x1806fc, 0x1f, 0x3)
        /usr/local/go/src/runtime/mcache.go:118 +0xae
        /usr/local/go/src/runtime/malloc.go:611 +0x2b
        /usr/local/go/src/runtime/asm_386.s:267 +0x57

goroutine 13619 [running]:
        /usr/local/go/src/runtime/asm_386.s:222 fp=0x4d542cb8 sp=0x4d542cb4
runtime.mallocgc(0x400, 0x73fc20, 0x1, 0x400)
        /usr/local/go/src/runtime/malloc.go:612 +0x65a fp=0x4d542d20 sp=0x4d542cb8
runtime.newarray(0x73fc20, 0x400, 0x400)
        /usr/local/go/src/runtime/malloc.go:777 +0xbc fp=0x4d542d44 sp=0x4d542d20
runtime.makeslice(0x734de0, 0x400, 0x0, 0x400, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/slice.go:32 +0x180 fp=0x4d542d84 sp=0x4d542d44*LineReader).decode(0x6969f8c0, 0x5a, 0x695c4ec0, 0x0, 0x0)
        /go/src/ +0x5f fp=0x4d542dfc sp=0x4d542d84*LineReader).advance(0x6969f8c0, 0x0, 0x0)
        /go/src/ +0x9a fp=0x4d542e98 sp=0x4d542dfc*LineReader).Next(0x6969f8c0, 0x0, 0x0, 0x0, 0x5b, 0x0, 0x0)
        /go/src/ +0x3d fp=0x4d542ef0 sp=0x4d542e98, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, ...)


(Allen) #4

Just an update of my issues.

I tried to use Filebeat 5.0.0-alpha5 (Windows 64 bits version) without any change of the config file, it all works fine!

It looks current filebeat production Windows version 1.2.3 also has this issue, it happens when trying to parse over certain amount of files, in my case it's over 300 files, and each event have multiple lines(some of my event over than 1000 lines). My filebeat version seems to crash and failed on allocate memory under Windows Server 2012 R2.

It seems Filebeat 5.0.0-alpha5 (Windows 64 bits version) has solved the problem.


(Steffen Siering) #5

yeah, 64bit binaries should help. Problem might be 32bit runtime trying to allocate more continuous virtual memory as supported by default on windows:

see error messages:

fatal error: runtime: cannot map pages in arena address space

this is virtual alloc reserving some address space for potential use, this is not about real memory being used.

For reference VirtualAlloc with MEM_COMMIT and PAGE_READWRITE fails.

(Allen) #6

I think you are right Seffens,

Have read the thread here:

I believe the issue is due to the max amount of files opening in file beat windows 32 bit version. In our case one log folder contains 11337 files and another log folder contains 29911 files. For the first-time parse this is an issue due to Window memory allocation failure with max threads exceed single process handling limit in Win 32 bits filebeat version.

It seems the 64 bits version has this issue fixed and our team decided to give it a go and see how it's going.

Would like to hear from ELK team how this issue get fixed in 64 bits version and when the production version of 64 bits could be released in a near future.


(Steffen Siering) #7

The 64bit version doesn't really behave different. The difference is with windows treating 32bit and 64bit binaries differently and applying other limits.

You have an estimate about number of files being handled concurrently (number of files open by filebeat)? Setting close_older to a few minutes will ensure filebeat closes files earlier.

Filebeat does not limit number of open files yet, as it is normally supposed to be a limited resource by the OS. Once out of available file-descriptors filebeat will just try and wait for resources being freed and continue processing.

Filebeat 5.0 will be released same date as elasticsearch, logstash, kibana 5.0. But I have no exact date for you yet.

Setting up a golang development environment you can rebuild filebeat 1.2 for 64bit windows yourself I think. But Filebeat has advanced this much since 1.x, personally I would prefer users going with current 5.0 alpha release then 1.x release. New beta phase (basically feature freeze) might start soonish (I hope, no actual date).

(system) #8

This topic was automatically closed after 21 days. New replies are no longer allowed.