Large XML crashes logstash with OOM

asatsi · September 4, 2015, 12:19pm

Hi,

Working on a setup with large XML files as part of the logs. The XMLs at times could be 10,20MB large. Facing issues with such large files and logstash crashing with OutOfMemory error, or entity expansion grown too large errors intermittently. Anyone came across such situation and know a way out? Would really appreciate your help in this regard.

Cheers,
Satish/

asatsi · September 5, 2015, 4:46am

@PhaedrusTheGreek It could be reproduced using the below configuration file:

input
{
        file
        {
                path => "/tmp/xml.log"
        }
}

filter
{
        multiline
        {
                pattern => "^<ns0:"
                negate => true
                what => previous
        }
        xml
        {
                source => [ "message" ]
                target => [ "x" ]
        }
}

output
{
        stdout
        {
                codec => rubydebug
        }
}

The input XML gist is here: https://gist.github.com/asatsi/330e5c23830752d53bee

FYI, I am running logstash 1.5.3.

magnusbaeck · September 6, 2015, 2:20pm

Logstash's default JVM heap is 500 MB and I think that should be enough for parsing a 20 MB XML file. Have you tried increasing the heap size? Depending on how you start Logstash you can do that via /etc/default/logstash, /etc/sysconfig/logstash, or by setting the LS_HEAP_SIZE environment variable. Try "1024m" for starters.

asatsi · September 8, 2015, 10:13am

Increasing the heap size to 4096m helped avoid the crashes for now. Thanks!

PhaedrusTheGreek · September 8, 2015, 4:11pm

@asatsi, I am able to reproduce the problem using your configuration and the provided XML file. Because each XML object is 20MB, this is expected. The XML DOM parsing library explodes each XML document into a much larger object in memory, so the only workaround would be to do as you did and increase the Java Heap size.

Also, please use caution if you intend to aggressively index documents of these size into Elasticsearch. Search and Aggregation should perform well, but Indexing, Retrieving and Merging will be Disk intensive.

Topic		Replies	Views
java.lang.OutOfMemoryError: Java heap space while indexing xml file Logstash	3	1797	September 4, 2017
XML parsing error: <RuntimeError: entity expansion has grown too large> Logstash	1	734	April 14, 2017
Xml-file too big for logstash? Logstash	28	2789	June 10, 2019
Logstash Error - RuntimeError: entity expansion has grown too large Logstash	3	505	June 21, 2021
Working with large XML's - logstash/elastic can not handle Logstash	2	654	January 4, 2018

Large XML crashes logstash with OOM

Related topics