Working on a setup with large XML files as part of the logs. The XMLs at times could be 10,20MB large. Facing issues with such large files and logstash crashing with OutOfMemory error, or entity expansion grown too large errors intermittently. Anyone came across such situation and know a way out? Would really appreciate your help in this regard.
Logstash's default JVM heap is 500 MB and I think that should be enough for parsing a 20 MB XML file. Have you tried increasing the heap size? Depending on how you start Logstash you can do that via /etc/default/logstash, /etc/sysconfig/logstash, or by setting the LS_HEAP_SIZE environment variable. Try "1024m" for starters.
@asatsi, I am able to reproduce the problem using your configuration and the provided XML file. Because each XML object is 20MB, this is expected. The XML DOM parsing library explodes each XML document into a much larger object in memory, so the only workaround would be to do as you did and increase the Java Heap size.
Also, please use caution if you intend to aggressively index documents of these size into Elasticsearch. Search and Aggregation should perform well, but Indexing, Retrieving and Merging will be Disk intensive.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.