How to debug workers not pulling from persistent queue

Hi!

I noticed I was no longer receiving data from Logstash so I logged in and noticed the persistent queue filled up but Logstash is not picking anything up from it:

$ sudo ls -l /home/naas_logstash/logstash/data/queue/main/
total 1048672
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:37 checkpoint.272
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:35 checkpoint.273
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:35 checkpoint.274
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:35 checkpoint.275
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:35 checkpoint.276
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:35 checkpoint.277
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:35 checkpoint.278
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:36 checkpoint.279
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:36 checkpoint.280
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:36 checkpoint.281
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:36 checkpoint.282
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:36 checkpoint.283
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:37 checkpoint.284
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:37 checkpoint.285
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:37 checkpoint.286
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:37 checkpoint.287
-rw-r--r-- 1 naas_logstash naas_logstash       34 Aug 28 13:37 checkpoint.head
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:34 page.272
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:35 page.273
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:35 page.274
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:35 page.275
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:35 page.276
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:35 page.277
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:35 page.278
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:36 page.279
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:36 page.280
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:36 page.281
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:36 page.282
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:36 page.283
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:37 page.284
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:37 page.285
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:37 page.286
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:37 page.287
-rw-r--r-- 1 naas_logstash naas_logstash 67108864 Aug 28 13:37 page.288

It is Aug 28 15:52 at the time of writing this so it's been stuck for about 2 hours now.

Here are my settings:

$ sudo cat /home/naas_logstash/logstash/config/logstash.yml
config.reload.automatic: true
dead_letter_queue.enable: true
http.host: 0.0.0.0
path.config: /home/naas_logstash/logstash/conf.d/
pipeline.batch.size: 250
pipeline.workers: 8
queue.checkpoint.writes: 1
queue.type: persisted

The logs show nothing interesting but I'm guessing the reason nothing is being picked up is because all the workers are stuck? If that's the case does anybody know how I would go about debugging that? Whenever I restart Logstash it processes for a minute then gets stuck again.

EDIT: I'm using Logstash 6.2.3

[2018-08-28T16:30:28,486][WARN ][logstash.shutdownwatcher ]
{
    "inflight_count"=>2000,
    "stalling_thread_info"=>{
        "other"=>[
            {"thread_id"=>41, "name"=>"[main]<tcp", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-tcp-5.0.3-java/lib/logstash/inputs/tcp.rb:154:in `run'"},
            {"thread_id"=>42, "name"=>"[main]<udp", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-udp-3.2.1/lib/logstash/inputs/udp.rb:106:in `select'"},
            {"thread_id"=>43, "name"=>"[main]<naasdedup", "current_call"=>"[...]/vendor/local_gems/01431115/logstash-output-naasdedup-0.1.5/lib/logstash/outputs/naasdedup.rb:39:in `sleep'"},
            {"thread_id"=>44, "name"=>"[main]<http", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-http-3.0.8/lib/logstash/inputs/http.rb:186:in `join'"},
            {"thread_id"=>45, "name"=>"[main]<http", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-http-3.0.8/lib/logstash/inputs/http.rb:186:in `join'"},
            {"thread_id"=>46, "name"=>"[main]<dead_letter_queue", "current_call"=>"[...]/vendor/bundle/jruby/2.3.0/gems/logstash-input-dead_letter_queue-1.1.4/lib/logstash/inputs/dead_letter_queue.rb:60:in `run'"}
        ],
        ["LogStash::Filters::Metrics", {"meter"=>"events", "add_tag"=>"metric", "id"=>"2de6af2f53d1012f0c58f1199db12424385fd834a35d13e7f5e53697b0916911"}]=>[
            {"thread_id"=>31, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"},
            {"thread_id"=>32, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"},
            {"thread_id"=>33, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"}
        ],
        ["LogStash::Filters::Gchg", {"json_path"=>"/home/naas_logstash/logstash/data/gchg/gchg.json", "id"=>"2076257caa3d16f061384d20564c0e5e0547dcea579ebc0508e8a0af2c2d1f34"}]=>[
            {"thread_id"=>34, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"},
            {"thread_id"=>35, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"},
            {"thread_id"=>36, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"},
            {"thread_id"=>37, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"},
            {"thread_id"=>38, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:54:in `write'"}
        ]
    }
}

I tried modifying the filters and this is what keeps getting logged when it tried to reload the pipeline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.