Logstash 6.6.1 running in Kubernetes 1.12, with only once ReplicaSet, which receives both Metricbeat and Filebeat data.
This morning, we noticed a continuous gap in the incoming Metricbeat data. Looking at the Logstash instance in Rancher, it didn't have any sign of trouble - it was green, but the internal pipeline processing was somehow stopped, until we restarted it.
After we restarted the instance (deleted it in Rancher, which caused the instance to be re-created), the data started flowing once again... Looking at the logs, post mortem, the only thing I found of any interest is this Stack trace below. I don't know if that was why the Logstash pipeline processing was halted, because it's logging level is only a WARN.
Questions:
- Would this error message below permanently stop Logstash main pipeline processing?
- If so, how could we setup the Liveness probe to detect that instability?
Here is an output of the node/pipeline setup on that Pod:
> bash-4.2$ curl -XGET 'localhost:9600/_node/pipelines?pretty'
{ "host" : "logstash-beats-0", "version" : "6.6.1", "http_address" : "0.0.0.0:9600", "id" : "14d98163-eb19-4adb-a6a6-8c1d4aeba54d", "name" : "logstash-beats-0", "pipelines" : { ".monitoring-logstash" : { }, "main" : { "workers" : 1, "batch_size" : 125, "batch_delay" : 50, "config_reload_automatic" : true, "config_reload_interval" : 3000000000, "dead_letter_queue_enabled" : false }
Here is the Pipeline stats output:
bash-4.2$ curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
{
"host": "logstash-beats-0",
"version": "6.6.1",
"http_address": "0.0.0.0:9600",
"id": "14d98163-eb19-4adb-a6a6-8c1d4aeba54d",
"name": "logstash-beats-0",
"pipelines": {
".monitoring-logstash": {
"events": null,
"plugins": {
"inputs": ,
"filters": ,
"outputs":
},
"reloads": {
"last_error": null,
"successes": 0,
"last_success_timestamp": null,
"last_failure_timestamp": null,
"failures": 0
},
"queue": null
},
"main": {
"events": {
"duration_in_millis": 870180,
"in": 775123,
"out": 775240,
"filtered": 775240,
"queue_push_duration_in_millis": 1930376
},
"plugins": {
"inputs": [{
"id": "0a2584b61bf0ea93b836354c8034d7c7f532fbaa7afad176b4887d69a43eb351",
"events": {
"out": 775123,
"queue_push_duration_in_millis": 1930376
},
"current_connections": 108,
"name": "beats",
"peak_connections": 435
}],
"filters": ,
"outputs": [{
"id": "9e6ce3b6bfe7511bfe1aa997d116005151cbec8f3dc1917f9b6175529b75eaee",
"documents": {
"successes": 775240
},
"events": {
"duration_in_millis": 803923,
"in": 775240,
"out": 775240
},
"bulk_requests": {
"successes": 36800,
"responses": {
"200": 36800
}
},
"name": "elasticsearch"
}]
},
"reloads": {
"last_error": null,
"successes": 0,
"last_success_timestamp": null,
"last_failure_timestamp": null,
"failures": 0
},
"queue": {
"events": 0,
"type": "persisted",
"capacity": {
"queue_size_in_bytes": 14153195,
"page_capacity_in_bytes": 67108864,
"max_queue_size_in_bytes": 1073741824,
"max_unread_events": 0
},
"data": {
"path": "/usr/share/logstash/data/queue/main",
"free_space_in_bytes": 8283140096,
"storage_type": "ext4"
}
}
}
}
}
Here is a partial stack-trace:
[2019-06-20T18:31:11,658][WARN ][io.netty.util.concurrent.SingleThreadEventExecutor] An event executor terminated with non-empty task queue (15)
[2019-06-20T18:31:11,709][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>25, "name"=>"[main]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in run'"}, {"thread_id"=>24, "name"=>"[main]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/pipeline.rb:333:in
read_batch'"}]}}
[2019-06-20T18:31:11,730][INFO ][org.logstash.beats.BeatsHandler] [local: 0.0.0.0:5044, remote: undefined] Handling exception: event executor terminated
[2019-06-20T18:31:11,731][WARN ][io.netty.channel.AbstractChannelHandlerContext] Failed to submit an exceptionCaught() event.
java.util.concurrent.RejectedExecutionException: event executor terminated
at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:855) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:778) ~[netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:267) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at org.logstash.beats.BeatsHandler.exceptionCaught(BeatsHandler.java:91) [logstash-input-beats-5.1.8.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:400) [netty-all-4.1.30.Final.jar:4.1.30.Final]
at