i notice that my issue is similar to Logstash runs as service but stops sending logs over http after some time but also note there was no resolution at that time. I'm hoping someone has found a solution since then.
My 7.4.2 ELK+filebeat on RHEL7 has enjoyed many months of uninterrupted service (since installation). We use it to ingest system logs from over 80 servers. This morning I noticed in Kibana that there were no results past approx 1:30am this morning.
With no messages in any logs that I could find that might point to the reason, I restarted logstash and saw that log messages were again being processed. Unfortunately, they only only continued for a short while and only up until the current time. Over the course of the day, I've repeated this process and its always the same thing: Logs are processed for a shorter and shorter amount of time depending on the time between restarts.
Each time I shutdown the service using "systemctl stop logstash", I see the exact same message repeated about a dozen times before the service eventually exits with failure:
[2020-07-28T15:33:01,338][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>26, "name"=>"[main]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-6.0.3-java/lib/logstash/inputs/beats.rb:204:in `run'"}], ["LogStash::Filters::Grok", {"match"=>["syslog_message", "%{DATA:bash_user} %{DATA:bash_usertty} %{TIMESTAMP_ISO8601:bash_ttydate} \\(%{DATA:bash_usersource}\\) %{DATA:bash_cmduser} \\[%{NUMBER:bash_id}\\] %{TIMESTAMP_ISO8601:bash_date} %{DATA:bash_command} \\[%{NUMBER:bash_return}\\]", "syslog_message", "%{DATA:bash_user} %{DATA:bash_usertty} %{TIMESTAMP_ISO8601:bash_ttydate} \\(%{DATA:bash_usersource}\\) %{DATA:bash_cmduser} \\[%{NUMBER:bash_id}\\] \\[%{NUMBER:bash_return}\\]"], "add_tag"=>["bash"], "id"=>"4227456ebf7d90cb390424c0518b027ace0edcce8fd4d5e6fb7be467f9fff883", "remove_tag"=>["_unfiltered"]}]=>[{"thread_id"=>24, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:182:in `match'"}, {"thread_id"=>25, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:182:in `match'"}]}}
The "id" listed (4227456ebf7d90cb390424c0518b027ace0edcce8fd4d5e6fb7be467f9fff883) is repeated over and over, regardless of how many times I've restarted logstash.
I experience similar behaviour when trying to stop filebeat using "systemctl stop filebeat". The service eventually exits with an error:
filebeat.service stop-sigterm timed out. Killing.
I suspect that one log file or line may be causing logstash and/or filebeat to stop forwarding and am curious to know if the "id" from the logstash shutdown has any relevance to that file/line.
Alternatively, if anyone else has experienced similar behaviour and has found a solution, I'd be keen to hear it. I have been unable to get my stack back working again thus far.