Parsing big xml files in logstash

I would like to load data from multiple big xml files to elasticsearch using logstash. Files are up to 3gb with many lines, all with the same structure. They contain a list of elements and I would like one element to become one event for elastic.

I was able to configure all tools to make it work, by using input file with multiline codec, then some filtering and output elasticsearch.

Unfortunately it works only for small files. For bigger file I have an error with unexpected tags or missing tags, my guess is this is because of the way the input file was read. For example if it takes only 30 lines, then if my element in array has only 20, the other 10 will be invalid because of some missing closing tag.

How can I parse and transform such files? I tried to experiment with max_bytes and max_lines but as far as I understood it consumes as many as I specify, so it will not solve the problem unless I set the max bigger that the 3gb files with millions of lines.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.