I am trying to load UK housing price data in a CSV from HM land registry. Column 4 in that CSV should be the postcode, but sometimes it is empty, and sometimes that causes dissect to throw an ArrayIndexOutOfBoundsException. It typically gets through 10,000 to 20,000 rows before failing. In all simpler test cases that i tried I get a _dissectfailure when working on a nil field. The obvious 'if [postcode]' does prevent the exception. This is in 6.0.0-rc1
and then this, where the exception occurs at JavaDissectorLibrary.java:130 (cf. JavaDissectorLibrary.java:113 above)
[ERROR] 2017-10-17 12:55:36.470 [[main]>worker1] pipeline - Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash.[...]
Exception in thread "[main]>worker1" java.lang.ArrayIndexOutOfBoundsException: 0
at org.logstash.dissect.DissectorErrorUtils.backtrace(DissectorErrorUtils.java:16)
at org.logstash.dissect.JavaDissectorLibrary$RubyDissect.logException(JavaDissectorLibrary.java:224)
at org.logstash.dissect.JavaDissectorLibrary$RubyDissect.dissect(JavaDissectorLibrary.java:130)
at org.logstash.dissect.JavaDissectorLibrary$RubyDissect.dissect_multi(JavaDissectorLibrary.java:140)
at org.logstash.dissect.JavaDissectorLibrary$RubyDissect$INVOKER$i$2$0$dissect_multi.call(JavaDissectorLibrary$RubyDissect$INVOKER$i$2$0$dissect_multi.gen)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:193)
at usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_dissect_minus_1_dot_0_dot_12.lib.logstash.filters.dissect.invokeOther2:dissect_multi(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-dissect-1.0.12/lib/logstash/filters/dissect.rb:182)
at usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_dissect_minus_1_dot_0_dot_12.lib.logstash.filters.dissect.RUBY$method$multi_filter$0(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-dissect-1.0.12/lib/logstash/filters/dissect.rb:182)
at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:103)
at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:163)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:163)
at usr.share.logstash.logstash_minus_core.lib.logstash.filter_delegator.invokeOther10:multi_filter(/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:48)
at usr.share.logstash.logstash_minus_core.lib.logstash.filter_delegator.RUBY$method$multi_filter$0(/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:48)
at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:103)
at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:163)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)
at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)
at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:132)
at org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:148)
at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:73)
at org.jruby.runtime.Block.call(Block.java:124)
at org.jruby.RubyProc.call(RubyProc.java:289)
[...]
Then it carries on processing for a while until we get the same ArrayIndexOutOfBoundsException in worker0, at which point logstash stops processing, since it has no workers working.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.