I have millions of lines of logs that I want to index in ES, using Filebeat + Logstash.
Each log line can have a different XML that represents a SOAP message that has been received by a WebService and logged, like in this example:
2019-11-13 10:00:1234 <?xml><Envelope><Body><GetUnitByPosition><Position>1234</Position></GetUnitByPosition></Body></<Envelope> 2019-11-12 09:30:5678 <?xml><Envelope><Body><GetPositionByName><Name>Position1</Name></GetPositionByName></Body></<Envelope> 2019-11-11 08:30:5678 <?xml><Envelope><Body><UpdatePosition><Position>1234567</Position><Name>Position2</Name><Tag>9876</Tag></UpdatePosition></Body></<Envelope>
In the real case, there are around 100 possible XML messages received by this web service, so I wanted to make the logs easier to read by storing the XML, using store_xml => true option in the xml filter in Logstash. Also, I am using XPath to extract every single possible element from the XMLs (yes, a heck of a tedious job) into separate fields.
Creating XPath for each element is too much work, but more than that, if there is any change on any of the XML schemas (or new elements), I would need to change the logstash config as well. I would like to avoid that.
To avoid that, I thought on removing the XPaths and work with the fields that the XML filter plugin generates. However, doing that brings another problem: since there are too many combinations / elements in XMLs, I get many error messages saying the limit of 1000 fields has been reached. That's because it generates the following fields for the example I gave above:
So, the question is: does anyone have any experience with a situation similar to this one? Any recommendation? Maybe I am missing something, a configuration to limits, etc.