Logstash-6.0.0 displays an unhelpful NullPointerException Error in case of corrupted persistent queue

I upgraded my logstash node to 6.0.0 from 5.6.0. Initially the node worked fine, but today I tried to restart it and the node refused to start. The error logged was

[2017-11-22T10:41:30,054][ERROR][logstash.agent           ] An exception happened when converging configuration {:exception=>LogStash::Error, :message=>"Don't know how to handle `Java::JavaLang::NullPointerException` for `LogStash::PipelineAction::Create/pipeline_id:main`", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/converge_result.rb:27:in `create'", "/usr/share/logstash/logstash-core/lib/logstash/converge_result.rb:67:in `add'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:347:in `block in converge_state'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:141:in `with_pipelines'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:332:in `block in converge_state'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:319:in `converge_state'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:166:in `block in converge_state_and_update'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:141:in `with_pipelines'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:164:in `converge_state_and_update'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:90:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:362:in `block in execute'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/task.rb:24:in `block in initialize'"]}

So after many restarts, I downgraded to 5.6.0 and tried restarting. Now logstash gave up with an error like

[2017-11-22T10:59:44,603][ERROR][logstash.pipeline ] Logstash failed to create queue {"exception"=>"Page file size 0 different to configured page capacity 262144000 for /data/logstash/queue/main/page.16", "backtrace"=>["org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:133:in open'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:41:in with_queue'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:30:in create_file_based'", "/usr/share/logstash/logstash-core/lib/logstash/queue_factory.rb:29:in create'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:168:in initialize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:286:in create_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:95:in register_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:309:in execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:204:in run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in (root)'"]}
[2017-11-22T10:59:44,604][ERROR][logstash.agent ] Cannot create pipeline {:reason=>"Page file size 0 different to configured page capacity 262144000 for /data/logstash/queue/main/page.16"}

Apparently the page file was corrupted. So I removed all existing page files and upgraded back to 6.0.0. Tried starting Logstash and it worked.

Has the helpful page size wrong message been replaced with a NullPointerException due to a bug?

Just to report that I got hit with the same issue on another node. I looked at the persistent queue directory and saw it had three page files - page.105, page.125 and page.126. It also had all the checkpoint files from 105 to 125 and `head. Removed all files from this directory and logstash started correctly.

I have another open question about "counter incremented with a negative value" that freezes pipeline workers which I suspect is to blame for "corrupted" persistent queues.

And the same problem strikes again. Here is a snapshot of my persistent queue dir

[root@logstash1 main]# ll
total 362552
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:55 checkpoint.107
-rw-r--r--. 1 logstash logstash        34 Nov 22 20:25 checkpoint.108
-rw-r--r--. 1 logstash logstash        34 Nov 22 21:25 checkpoint.109
-rw-r--r--. 1 logstash logstash        34 Nov 22 22:37 checkpoint.110
-rw-r--r--. 1 logstash logstash        34 Nov 23 00:01 checkpoint.111
-rw-r--r--. 1 logstash logstash        34 Nov 23 01:27 checkpoint.112
-rw-r--r--. 1 logstash logstash        34 Nov 23 03:24 checkpoint.113
-rw-r--r--. 1 logstash logstash        34 Nov 23 05:15 checkpoint.114
-rw-r--r--. 1 logstash logstash        34 Nov 23 07:09 checkpoint.115
-rw-r--r--. 1 logstash logstash        34 Nov 23 08:04 checkpoint.116
-rw-r--r--. 1 logstash logstash        34 Nov 23 08:18 checkpoint.117
-rw-r--r--. 1 logstash logstash        34 Nov 23 08:34 checkpoint.118
-rw-r--r--. 1 logstash logstash        34 Nov 23 08:45 checkpoint.119
-rw-r--r--. 1 logstash logstash        34 Nov 23 08:57 checkpoint.120
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:07 checkpoint.121
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:18 checkpoint.122
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:24 checkpoint.123
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:30 checkpoint.124
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:36 checkpoint.125
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:42 checkpoint.126
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:48 checkpoint.127
-rw-r--r--. 1 logstash logstash        34 Nov 23 09:56 checkpoint.128
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:02 checkpoint.129
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:05 checkpoint.130
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:09 checkpoint.131
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:12 checkpoint.132
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:15 checkpoint.133
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:19 checkpoint.134
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:22 checkpoint.135
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:25 checkpoint.136
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:28 checkpoint.137
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:32 checkpoint.138
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:35 checkpoint.139
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:38 checkpoint.140
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:41 checkpoint.141
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:44 checkpoint.142
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:47 checkpoint.143
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:50 checkpoint.144
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:54 checkpoint.145
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:55 checkpoint.146
-rw-r--r--. 1 logstash logstash        34 Nov 23 10:56 checkpoint.head
-rw-r--r--. 1 logstash logstash 262144000 Nov 22 19:30 page.107
-rw-r--r--. 1 logstash logstash 262144000 Nov 23 10:55 page.146
-rw-r--r--. 1 logstash logstash 262144000 Nov 23 10:55 page.147

Due to corrupted queue, I got the same NullPointer Exception. Tried to remove all intermediate checkpoint files and Logstash complained "checkpoint.108 not found". Then tried removing just page.107 and checkpoint.107 (leaving 146 and 147 as it is) but logstash complained "checkpoint.107 not found". So had to empty out the queue and restart Logstash.

Hi @Dheeraj_Gupta, thanks for the report.

Could you elaborate a bit more on your exact restart process? Was LS stuck or did the shutdown process hang or did the shutdown completed "cleanly" - how did you do the shutdown? Also do you have any other relevant logs prior to the shutdown and prior the the exception on restart?

Thanks,
Colin

Hi @colinsurprenant . Thanks for your reply.

The restart was using service logstash restart (service script as provided by logstash RPM).

I have another open question here, where I have reported an error logged by metrics leading to drop in performance. Service restart was necessitated because of that issue.

The shutdown process did not "hang". It did take some time with a number of messages about "inflight_count"=> like it always does, but the last message logged was Pipeline terminated (as always) so to me it appeared that shutdown was clean.

Logs on restart are:

[2017-11-22T10:38:34,625][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2017-11-22T10:38:34,629][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2017-11-22T10:38:34,842][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
...a really big line that I can't add because of post limits (see screenshot)...
[2017-11-22T10:58:19,939][ERROR][logstash.agent           ] An exception happened when converging configuration {:exception=>LogStash::Error, :message=>"Don't know how to handle `Java::JavaLang::NullPointerException` for `LogStash::PipelineAction::Create/pipeline_id:main`", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/converge_result.rb:27:in `create'", "/usr/share/logstash/logstash-core/lib/logstash/converge_result.rb:67:in `add'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:347:in `block in converge_state'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:141:in `with_pipelines'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:332:in `block in converge_state'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:319:in `converge_state'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:166:in `block in converge_state_and_update'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:141:in `with_pipelines'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:164:in `converge_state_and_update'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:90:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:362:in `block in execute'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/task.rb:24:in `block in initialize'"]}

For 5.6.0, the same node gave the following stack-trace

[2017-11-22T10:59:41,020][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2017-11-22T10:59:41,026][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2017-11-22T10:59:44,603][ERROR][logstash.pipeline        ] Logstash failed to create queue {"exception"=>"Page file size 0 different to configured page capacity 262144000 for /data/logstash/queue/main/page.16", "backtrace"=>["org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:133:in `open'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:41:in `with_queue'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:30:in `create_file_based'", "/usr/share/logstash/logstash-core/lib/logstash/queue_factory.rb:29:in `create'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:168:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:286:in `create_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:95:in `register_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:309:in `execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:204:in `run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in `(root)'"]}
[2017-11-22T10:59:44,604][ERROR][logstash.agent           ] Cannot create pipeline {:reason=>"Page file size 0 different to configured page capacity 262144000 for /data/logstash/queue/main/page.16"}

After this I removed all existing pages, upgraded back to 6.0.0 and then logstash service started correctly.

Regards,
Dheeraj

Dheeraj,

I have hopefully good news for you. Again thanks for you report.

The NPE exception you originally reported should be fixed in the upcoming 6.0.1 release, see issue https://github.com/elastic/logstash/issues/8769 and PR https://github.com/elastic/logstash/pull/8770

Also, when you reverted back to 5.6 you unfortunately hit another issue caused by a missing commit in the 5.6 branch. I also fixed that and upcoming 5.6.5 will fix this, see https://github.com/elastic/logstash/pull/7810.

Thanks,
Colin

1 Like

Thank you Colin for taking a look at this and fixing it so quickly. I will upgrade to 6.0.1(when it releases) and report back if I encounter this again. Thanks Again!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.