Strange heartbeat errors


(Phil) #1

Hi,

I am using ELK 6.2.2 with all Beats.

When I use single node and run heartbeat, it does not come with any errors..

But when I run heartbeats with cluster, I am seeing long list of logs and the heartbeats stops.

Not sure where to locate the error.

Lots of "goroutine" message.

Sample is given below as there is word limit here.

goroutine 124 [runnable]:
internal/poll.runtime_pollWait(0x231c900, 0x77, 0xc0426520b8)
        /usr/local/go/src/runtime/netpoll.go:173 +0x5e
internal/poll.(*pollDesc).wait(0xc042652158, 0x77, 0xc042393400, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xb5
internal/poll.(*ioSrv).ExecIO(0x16e1098, 0xc0426520b8, 0x107d860, 0xc0424a2db0, 0xc0424a2db8, 0xc04
        /usr/local/go/src/internal/poll/fd_windows.go:205 +0x13a
internal/poll.(*FD).ConnectEx(0xc042652000, 0x1684620, 0xc0423fa840, 0xc04242b020, 0xc042652000)
        /usr/local/go/src/internal/poll/fd_windows.go:757 +0x80

Any idea where I can solve this and make the heartbeat running again with cluster?

Thanks.


(Adrian Serrano) #2

Can you please provide the full output from heartbeat? Just a single goroutine block is not enough to diagnose the problem.

Also, paste your configuration (heartbeat.yml) so we can understand your setup.

Thanks!


(Phil) #3

Hi Adrian,
thank for the reply.
heartbeat.yml

heartbeat.monitors:
- type: icmp 
  schedule: '*/5 * * * * * *' 
  hosts: ["localhost:9201", "4Q******", "5L******", "X3******"]

  ipv4: true
  ipv6: true
  mode: any
  timeout: 1s
  wait: 1s
- type: tcp
  schedule: '@every 5s' 

  hosts: ["localhost:9201", "4Q******", "5L******", "X3******"]

  ipv4: true
  ipv6: true
  mode: any
  ports: [80, 9201, 5044, 8081]

- type: http 
  schedule: '@every 5s' 
  urls: ["http://localhost:9201"]
  ipv4: true
  ipv6: true
  mode: any
setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression
setup.dashboards.enabled: false  ## error in retreiving the json file from kibana folder ---
setup.dashboards.directory: kibana
setup.kibana:
  host: "localhost:5601"
  protocol: "http"
  username: "username"
  password: "password"
  bulk_max_size: 2048
output.logstash:
  hosts: ["localhost:5044"]

Output

C:************\heartbeat>heartbeat.exe -e -c heartbeat.yml
2018-06-04T11:55:05.084-0400    INFO    instance/beat.go:468    Home path: [C:************\heartbeat] Config path: [C:************\heartbeat] Data path: [C:************be
ta] Logs path: [C:************\heartbeat\logs]
2018-06-04T11:55:05.088-0400    INFO    instance/beat.go:475    Beat UUID: 549b2f84-9c74-4b83-8894-56b2d7fbd6b9
2018-06-04T11:55:05.088-0400    INFO    instance/beat.go:213    Setup Beat: heartbeat; Version: 6.2.2
2018-06-04T11:55:05.093-0400    INFO    pipeline/module.go:76   Beat name: ************
2018-06-04T11:55:05.094-0400    WARN    beater/heartbeat.go:24  Beta: Heartbeat is beta software
2018-06-04T11:55:05.094-0400    INFO    beater/manager.go:110   Select (active) monitor icmp
2018-06-04T11:55:05.095-0400    INFO    beater/manager.go:110   Select (active) monitor tcp
2018-06-04T11:55:05.095-0400    INFO    beater/manager.go:110   Select (active) monitor http
2018-06-04T11:55:05.100-0400    INFO    instance/beat.go:301    heartbeat start running.
2018-06-04T11:55:05.100-0400    INFO    [monitoring]    log/log.go:97   Starting metrics logging every 30s
2018-06-04T11:55:05.101-0400    INFO    beater/heartbeat.go:56  heartbeat is running! Hit CTRL-C to stop it.
2018-06-04T11:55:15.113-0400    INFO    scheduler/scheduler.go:294      Scheduled job 'tcp-tcp@5*****:[80 9201 5044 8081]' already active.
2018-06-04T11:55:15.114-0400    INFO    scheduler/scheduler.go:294      Scheduled job 'http@http://localhost:9201' already active.
2018-06-04T11:55:20.114-0400    INFO    scheduler/scheduler.go:294      Scheduled job 'tcp-tcp@5******:[80 9201 5044 8081]' already active.
2018-06-04T11:55:20.114-0400    INFO    scheduler/scheduler.go:294      Scheduled job 'http@http://localhost:9201' already active.
2018-06-04T11:55:25.115-0400    INFO    scheduler/scheduler.go:294      Scheduled job 'tcp-tcp@5*****:[80 9201 5044 8081]' already active.
2018-06-04T11:55:25.116-0400    INFO    scheduler/scheduler.go:294      Scheduled job 'http@http://localhost:9201' already active.
fatal error: concurrent map writes

goroutine 105 [running]:
runtime.throw().........
        github.com/elastic/beats/libbeat/common.MapStr.DeepUpdate
    github.com/elastic/beats/libbeat/common.deepUpdateValue
     ............  
        /go/src/github.com/elastic/beats/heartbeat/beater/manager.go:287 ....
        /go/src/github.com/....cheduler/scheduler.go:312 
goroutine 1 [chan receive]:
github.com/elastic/beats/heartbeat/beater.(*Heartbeat).Run()
        /go/src/github.c..../beater/heartbeat.go:63 
github.c.../instance.(*Beat).launch()
       ........(*Command).execute()
        /go/src/gith...om/spf13/cobra/command.go:704 
github.com/....com/spf13/cobra.....
main.main()
        /go/src/.....s/heartbeat/main.go:10 
goroutine 35 [syscall]:
os/signal.signal_recv()
        /usr/..sigqueue.go:131 
  ....

goroutine 60 [chan receive]:
github.com/elastic/..../elastic/go-lumber/client/v2.(*AsyncClient).ackLoop()....
        /go/src/.../github.com/elastic/go-.......(*AsyncClient).startACK
.
goroutine 67 [runnable]:
github.com/elastic/.../memqueue.(*bufferingEventLoop).run()
        /go/src/github.com/..../memqueue/eventloop.go:299    .....
goroutine 68 [select]:
github.com/elastic/..../memqueue.(*ackLoop).run(...)
        /go/src/github.com/..../queue/memqueue/ackloop.go:43 
   .....

goroutine 69 [select]:
github.com/.../pipeline.(*eventConsumer).loop(...)
   .......
goroutine 70 [select]:
github.com/..../pipeline.(*retryer).loop(....)
   .....
goroutine 71 [chan send]:
github.com/.../elastic/go-lumber/client/v2.(*AsyncClient).Send(....)
       .......

goroutine 15 [IO wait]:
internal/poll.runtime_pollWait(...)
       ......
github.com/.../x/net/icmp.(*PacketConn).ReadFrom(...)
        /go/src/github.com/e...g/x/net/icmp/endpoint.go:59 
github.com/elastic/..../active/icmp.(*icmpLoop).runICMPRecv(...)
        /go/src/github.com..../active/icmp/loop.go:123 
created by github.com/..../active/icmp.newICMPLoop
        /go/src/.../monitors/active/icmp/loop.go:87 

goroutine 16 [runnable]:
internal/poll.runtime_pollWait(...)
        /usr.../netpoll.go:173 
 .....

goroutine 81 [select]:
github.com.../monitoring/report/log.........

goroutine 82 [chan receive]:
github.com/elastic/...rvice.HandleSignals.func1(....)
    .....

goroutine 83 [chan receive]:
github.com/..libbeat/service.(*beatService).Execute(.....)
......
/go/src..../x/sys/windows/svc/debug/service.go:40 
......

goroutine 63 [select]:
github.com/elastic/be...../svc/debug.Run.func1(.....)
        /go/src/github.com....ws/svc/debug/service.go:32 
 ......

goroutine 64 [runnable]:
github.com/elastic/beats/heartbeat/scheduler.(*Scheduler).run(...)
        /go/src/github...../scheduler.go:183 
 ......

goroutine 165 [IO wait]:
internal/poll.runtime_pollWait(...)
        /usr/local/go/src/runtime/netpoll.go:173 
internal/poll.(*pollDesc).wait(..)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 
internal/poll.(*ioSrv).ExecIO(...)
   .....
github.com/e..../elastic/go-lumber/client/v2.(*AsyncClient).ackLoop()
        /go/src/..../github.com/elastic/go-lumber/client/v2/async.go:156 
created by github.com/...../elastic/go-lumber/client/v2.(*AsyncClient).startACK
        /go/src/github.com.../go-lumber/client/v2/async.go:123 

goroutine 192 [runnable]:
net.(*netFD).connect.func2(...)
src/net/fd_windows.go:105 
 ......

Had to cut down ----
because of
"Body is limited to 7000 characters; you entered 32031."

Thanks.


(Adrian Serrano) #4

This bug has been fixed recently, a fix for it will be present in the next release (6.3.0).


(Phil) #5

Hi Adrian,

thanks for the reply. Is there any workaround for this? (for 6.2.2)

Phil


(Adrian Serrano) #6

No, best case is to use a snapshot version until a new release comes out

https://s3-us-west-2.amazonaws.com/beats-package-snapshots/index.html?prefix=heartbeat/

Or a 6.3.0 build candidate


(Phil) #7

Hi Adrian,

Thanks for the reply.

I also noticed that if I go back to the single node.

Same error happens. Does this happen in both situation (single and cluster)

When will the new version will be release?

Once again thanks.

Phil


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.