How to effectively use multiline codec with a crazy long xml file?

iam_thadiyan · December 20, 2017, 8:59am

I have an XML file from a production machine which has more than 600MBs worth of xml data (15000000 lines)

This file is very important from analytics point of view. But I find it difficult to get the entire file to push through multiline line codec in input because of jvm limitations. I don't know if I will get a file which is bigger than this.

Has anyone handled this much of data in one file before? I yes, what's the strategy?

Attaching my input config.

input {
        stdin { }
        file {
                path => 'input_file'
                start_position => 'beginning'
                sincedb_path => 'NUL'
                codec => multiline {
                        pattern => '^<log'
                        what => 'previous'
                        negate => 'true'
                        auto_flush_interval => 1
                        max_lines => 10000000
                        max_bytes => '600 MiB'
                }
        }
}

iam_thadiyan · December 20, 2017, 1:43pm

Any leads for me to pursue? Thanks in advance.

iam_thadiyan · January 1, 2018, 1:18pm

Hi . Any ideas?

system · January 29, 2018, 1:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Handling Large XML Files Logstash	2	289	November 28, 2021
Multiline XML processing using logstash Logstash	7	1757	February 2, 2018
Multiline codec along with other type of codec Logstash	2	250	November 9, 2020
Logstash is waiting for new line even if in content is XML and multiline codec is used Logstash	2	887	February 7, 2017
Multiple line with extreme long log "line" Logstash	2	716	July 6, 2017

How to effectively use multiline codec with a crazy long xml file?

Related topics