How to effectively use multiline codec with a crazy long xml file?


(Iam Thadiyan) #1

I have an XML file from a production machine which has more than 600MBs worth of xml data (15000000 lines)

This file is very important from analytics point of view. But I find it difficult to get the entire file to push through multiline line codec in input because of jvm limitations. I don't know if I will get a file which is bigger than this.

Has anyone handled this much of data in one file before? I yes, what's the strategy?

Attaching my input config.

input {
        stdin { }
        file {
                path => 'input_file'
                start_position => 'beginning'
                sincedb_path => 'NUL'
                codec => multiline {
                        pattern => '^<log'
                        what => 'previous'
                        negate => 'true'
                        auto_flush_interval => 1
                        max_lines => 10000000
                        max_bytes => '600 MiB'
                }
        }
}

(Iam Thadiyan) #2

Any leads for me to pursue? Thanks in advance.


(Iam Thadiyan) #3

Hi . Any ideas?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.