In Logstash I am trying to replace a prefix in the XML which is not having any definition and then indexing it to elastic. Receiving error while parsing some huge XMLs. I would be requiring the full XML to be stored in elastic for analytics. PFB error snippet & yml file for Logstash configuration.
Error -
exception=>#<RuntimeError: entity expansion has grown too large>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/text.rb:399:in block in unnormalize'", "org/jruby/RubyString.java:3056:in gsub'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/text.rb:396:in unnormalize'"`
The error is occurring in the rexml library, which is used by the XmlSimple library that the logstash xml filter uses. The default limit on the size of an XML entity is 10 KB. Note that, as far as I can see, the problem is not the size of the XML document, it is the size of an entity within that document.
There is no way to pass rexml configuration options to the xml filter.
The limit is a class variable. So let me say that it would be a terrible, terrible idea to use a ruby filter to set it before calling the xml filter. Do not do it.
If you do not actually need to store the entire document then if you use the xpath option the XML is parsed using nokogiri instead of XmlSimple. That may not have the same limits.
rexml library is used in XML filter only because ruby filter (in this case mutate gsub) is used prior to XML filter?
If yes for the above query , I am using it to replace the invalid namespace prefix present in "message", is there a way to achieve the same without having ruby filter before XML filter?
Would not be able to provide XPath for nodes to parse since many API transaction logs are being pushed to elastic using logstash, so not feasible to provide all the tag names. Any other workaround please?
Would there be an end difference of storing message as xml instead of text during aggregation in Kibana? Because as a text I am able to index all these XMLs but since it is huge not able to process aggregation on those messages fields.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.