We are using Logstash translate plugin to add user information to IP addresses in logs/events in our organization. The user data is loaded via YAML. The file is large (5.5MB with around 15K entries).
Till Logstash-8.6, it worked correctly.
However, on upgrading to Logstash-8.7, the pipeline does not start and following error is logged in the logs
Pipeline error {:pipeline_id=>"0_main", :exception=>#<LogStash::Filters::Dictionary::DictionaryFileError: Translate: The incoming YAML document exceeds the limit: 3145728 code points. when loading dictionary file at /usr/share/user_db/user_data.yaml>, :backtrace=>["org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:342)", "org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:263)", "org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingValue.produce(ParserImpl.java:694)", "org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:185)", "org.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:195)", "org.jruby.ext.psych.PsychParser.parse(PsychParser.java:210)", "org.jruby.ext.psych.PsychParser$INVOKER$i$parse.call(PsychParser$INVOKER$i$parse.gen)", "org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:393)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:206)", "org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:325)", "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)", "org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:86)"
I belive this is due to upgrade in snakeYAML or JRuby version bundled with logstash. I found a similar bug report in JRuby
Is there a way in Logstash to configure the max data limit for snakeYAML?
As a workaround I have switched to JSON because we had the same information in JSON as wel.
Thanks