Logstash Beats Pod stopped servicing our requests

Logstash 6.6.1 running in Kubernetes 1.12, with only once ReplicaSet, which receives both Metricbeat and Filebeat data.

This morning, we noticed a continuous gap in the incoming Metricbeat data. Looking at the Logstash instance in Rancher, it didn't have any sign of trouble - it was green, but the internal pipeline processing was somehow stopped, until we restarted it.

After we restarted the instance (deleted it in Rancher, which caused the instance to be re-created), the data started flowing once again... Looking at the logs, post mortem, the only thing I found of any interest is this Stack trace below. I don't know if that was why the Logstash pipeline processing was halted, because it's logging level is only a WARN.

Questions:

  1. Would this error message below permanently stop Logstash main pipeline processing?
  2. If so, how could we setup the Liveness probe to detect that instability?

Here is an output of the node/pipeline setup on that Pod:
> bash-4.2$ curl -XGET 'localhost:9600/_node/pipelines?pretty'

{
  "host" : "logstash-beats-0",
  "version" : "6.6.1",
  "http_address" : "0.0.0.0:9600",
  "id" : "14d98163-eb19-4adb-a6a6-8c1d4aeba54d",
  "name" : "logstash-beats-0",
  "pipelines" : {
    ".monitoring-logstash" : { },
    "main" : {
      "workers" : 1,
      "batch_size" : 125,
      "batch_delay" : 50,
      "config_reload_automatic" : true,
      "config_reload_interval" : 3000000000,
      "dead_letter_queue_enabled" : false
    } 

Here is the Pipeline stats output:
bash-4.2$ curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'

{
"host": "logstash-beats-0",
"version": "6.6.1",
"http_address": "0.0.0.0:9600",
"id": "14d98163-eb19-4adb-a6a6-8c1d4aeba54d",
"name": "logstash-beats-0",
"pipelines": {
".monitoring-logstash": {
"events": null,
"plugins": {
"inputs": ,
"filters": ,
"outputs":
},
"reloads": {
"last_error": null,
"successes": 0,
"last_success_timestamp": null,
"last_failure_timestamp": null,
"failures": 0
},
"queue": null
},
"main": {
"events": {
"duration_in_millis": 870180,
"in": 775123,
"out": 775240,
"filtered": 775240,
"queue_push_duration_in_millis": 1930376
},
"plugins": {
"inputs": [{
"id": "0a2584b61bf0ea93b836354c8034d7c7f532fbaa7afad176b4887d69a43eb351",
"events": {
"out": 775123,
"queue_push_duration_in_millis": 1930376
},
"current_connections": 108,
"name": "beats",
"peak_connections": 435
}],
"filters": ,
"outputs": [{
"id": "9e6ce3b6bfe7511bfe1aa997d116005151cbec8f3dc1917f9b6175529b75eaee",
"documents": {
"successes": 775240
},
"events": {
"duration_in_millis": 803923,
"in": 775240,
"out": 775240
},
"bulk_requests": {
"successes": 36800,
"responses": {
"200": 36800
}
},
"name": "elasticsearch"
}]
},
"reloads": {
"last_error": null,
"successes": 0,
"last_success_timestamp": null,
"last_failure_timestamp": null,
"failures": 0
},
"queue": {
"events": 0,
"type": "persisted",
"capacity": {
"queue_size_in_bytes": 14153195,
"page_capacity_in_bytes": 67108864,
"max_queue_size_in_bytes": 1073741824,
"max_unread_events": 0
},
"data": {
"path": "/usr/share/logstash/data/queue/main",
"free_space_in_bytes": 8283140096,
"storage_type": "ext4"
}
}
}
}
}

Here is a partial stack-trace:
[2019-06-20T18:31:11,658][WARN ][io.netty.util.concurrent.SingleThreadEventExecutor] An event executor terminated with non-empty task queue (15)
[2019-06-20T18:31:11,709][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>25, "name"=>"[main]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in run'"}, {"thread_id"=>24, "name"=>"[main]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/pipeline.rb:333:in read_batch'"}]}}
[2019-06-20T18:31:11,730][INFO ][org.logstash.beats.BeatsHandler] [local: 0.0.0.0:5044, remote: undefined] Handling exception: event executor terminated
[2019-06-20T18:31:11,731][WARN ][io.netty.channel.AbstractChannelHandlerContext] Failed to submit an exceptionCaught() event.
java.util.concurrent.RejectedExecutionException: event executor terminated
at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:855) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:778) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:267) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at org.logstash.beats.BeatsHandler.exceptionCaught(BeatsHandler.java:91) [logstash-input-beats-5.1.8.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:400) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.