I have an XML file from a production machine which has more than 600MBs worth of xml data (15000000 lines)
This file is very important from analytics point of view. But I find it difficult to get the entire file to push through multiline line codec in input because of jvm limitations. I don't know if I will get a file which is bigger than this.
Has anyone handled this much of data in one file before? I yes, what's the strategy?
Attaching my input config.
input { stdin { } file { path => 'input_file' start_position => 'beginning' sincedb_path => 'NUL' codec => multiline { pattern => '^<log' what => 'previous' negate => 'true' auto_flush_interval => 1 max_lines => 10000000 max_bytes => '600 MiB' } } }