I am using the XML filter to filter data from S3 files and it is working for some of the documents, but I am getting a lot of errors on other documents. From Elasticsearch, I see it rejects a lot of them with this error:
[2018-03-17T19:16:06,918][DEBUG][o.e.a.b.TransportShardBulkAction] [us-patent-grants-2018.03.17][1] failed to execute bulk item (index) BulkShardRequest [[us-patent-grants-2018.03.17][1]] containing [index {[us-patent-grants-2018.03.17][doc][L7VjNWIBNxeDr0PE-AuG], source[n/a, actual length: [168.6kb], max length: 2kb]}]
and from Logstash logs I get:
Line: 2
Position: 39
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:341:in `pull_event'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in `pull'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:23:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:187:in `filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145:in `do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164:in `block in multi_filter'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161:in `multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:47:in `multi_filter'", "(eval):42:in `block in filter_func'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:447:in `filter_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:426:in `worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385:in `block in start_workers'"]}
I'm not really sure what this means or why it's happening. Any ideas?
I am using the multiline input codec:
codec => multiline {
pattern => "^*"
what => "next"
max_lines => 10000
max_bytes => "100 MiB"
}
and the xml filter:
xml {
source => "message"
store_xml => true
target => "message"
}