Reducing stalling_threads on restart of logstash

rsaeks · July 29, 2019, 8:11pm

Good afternoon,

I've been using logstash for a while and have been trying to troubleshoot the cause of stalling_threads_info being thrown when restarting the service. I've tried switching out a variety of inputs / filters but the threads always seem to be stalled, no matter what is in place. Are there any ways to get some more information about what the cause may be for those in Logstash 7.2?

At one point it was showing a stalled thread to mutating:

[2019-07-29T14:28:35,929][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>1742, "name"=>"[main]<file", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.1.10/lib/filewatch/watched_files_collection.rb:51:in `sort!'"}], ["LogStash::Filters::Mutate", {"add_tag"=>["19-filter-computerLogins"], "id"=>"3102807155779a12fa498fbd00014146c0d486c1ebfa0d5635fced002ec21d8d"}]=>[{"thread_id"=>1734, "name"=>"[main]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>1735, "name"=>"[main]>worker1", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>1736, "name"=>"[main]>worker2", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>1737, "name"=>"[main]>worker3", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}]}}

another time it was processing a CSV file:

[2019-07-29T14:56:38,311][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>66, "name"=>"[main]<file", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.1.10/lib/filewatch/stat/generic.rb:17:in `stat'"}, {"thread_id"=>70, "name"=>"[main]<jdbc", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler.rb:170:in `join'"}, {"thread_id"=>71, "name"=>"[main]<file", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.1.10/lib/filewatch/watch.rb:54:in `sleep'"}, {"thread_id"=>72, "name"=>"[main]<tcp", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-tcp-6.0.3-java/lib/logstash/inputs/tcp.rb:152:in `run'"}], ["LogStash::Filters::CSV", {"add_field"=>["received_at", "%{@timestamp}"], "id"=>"8b55b29a05ec76c8dbf717034a38e1832d8ef13e368ce749246d1ad267f7bebe", "separator"=>",", "columns"=>["action", "username", "eventTime", "ip"]}]=>[{"thread_id"=>59, "name"=>"[main]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>60, "name"=>"[main]>worker1", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>61, "name"=>"[main]>worker2", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>62, "name"=>"[main]>worker3", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}]}}

and most recently is processing a Cisco Switch syslog entry:

[2019-07-29T14:58:39,454][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>65, "name"=>"[main]<file", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.1.10/lib/filewatch/watched_files_collection.rb:80:in `block in []='"}, {"thread_id"=>68, "name"=>"[main]<udp", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-udp-3.3.4/lib/logstash/inputs/udp.rb:121:in `select'"}], ["LogStash::Filters::Grok", {"match"=>["message", "<%{NUMBER:message_type_id}>%{BASE10NUM:event_no}: %{CISCOTIMESTAMP:cisco_time} %{WORD}: %%{NOTSPACE:cisco_type}: %{GREEDYDATA:description} %{NOTSPACE:interface}, %{GREEDYDATA:status}", "message", "<%{NUMBER:message_type_id}>%{BASE10NUM:event_no}: %{CISCOTIMESTAMP:cisco_time} %{WORD}: %%{NOTSPACE:cisco_type}: %{GREEDYDATA:description} %{NOTSPACE:interface}: %{GREEDYDATA:status}"], "add_tag"=>["cisco-switch-processed"], "id"=>"6645675d6b237fd7209905f92aafccc1b23bdd6b7703f0f1febb584441819132", "tag_on_failure"=>["cisco-switch-todo"]}]=>[{"thread_id"=>57, "name"=>"[main]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>58, "name"=>"[main]>worker1", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>59, "name"=>"[main]>worker2", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}, {"thread_id"=>60, "name"=>"[main]>worker3", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:239:in `block in start_workers'"}]}}

I'd love to have a bit more of a stable / reliable install and just looking for some tips on how to troubleshoot.

Thanks!

guyboertje · July 30, 2019, 3:56pm

As the stalled threads info is a snapshot in time it shows which threads are still processing events and where execution is at the time of the snapshot.

As we don't want to lose events/data Logstash will wait until all workers have managed to output their batches downstream.

I see that you have the file input thread as stalled. This means that it is waiting on the queue, i.e. it has more events to add to the queue. It reads chunks of data and a chunk could have 1000's of events in it. While it is processing a chunk it does not heed the shutdown signal. You can make the chunk sizes much smaller and this will make the file input more responsive to the shutdown signal.

file_chunk_size is the setting that control this.

The default in bytes is 32768 (32K), you can set this as low as 256 with no appreciable change in throughput.

system · August 27, 2019, 3:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is Logstash stalling for so long? Logstash	1	540	May 3, 2018
Logstash Pipeline Sequence Logstash	2	935	July 6, 2017
Logstash stalling during auto-reconfigure even though inflight_events = 0 Logstash	7	1401	May 8, 2018
Logstash - shutdown process appears to be stalled Logstash	1	1588	June 5, 2017
Stalling thread info jdbc Logstash	4	1649	February 15, 2019

Reducing stalling_threads on restart of logstash

Related topics