Hi,
I have millions of lines of logs that I want to index in ES, using Filebeat + Logstash.
Each log line can have a different XML that represents a SOAP message that has been received by a WebService and logged, like in this example:
2019-11-13 10:00:1234 <?xml><Envelope><Body><GetUnitByPosition><Position>1234</Position></GetUnitByPosition></Body></<Envelope>
2019-11-12 09:30:5678 <?xml><Envelope><Body><GetPositionByName><Name>Position1</Name></GetPositionByName></Body></<Envelope>
2019-11-11 08:30:5678 <?xml><Envelope><Body><UpdatePosition><Position>1234567</Position><Name>Position2</Name><Tag>9876</Tag></UpdatePosition></Body></<Envelope>
In the real case, there are around 100 possible XML messages received by this web service, so I wanted to make the logs easier to read by storing the XML, using store_xml => true option in the xml filter in Logstash. Also, I am using XPath to extract every single possible element from the XMLs (yes, a heck of a tedious job) into separate fields.
Problems are:
-
Creating XPath for each element is too much work, but more than that, if there is any change on any of the XML schemas (or new elements), I would need to change the logstash config as well. I would like to avoid that.
-
To avoid that, I thought on removing the XPaths and work with the fields that the XML filter plugin generates. However, doing that brings another problem: since there are too many combinations / elements in XMLs, I get many error messages saying the limit of 1000 fields has been reached. That's because it generates the following fields for the example I gave above:
xmldata.Envelope.Body.GetUnitByPosition.Position
xmldata.Envelope.Body.GetPositionByName.Name
xmldata.Envelope.Body.UpdatePosition.Position
xmldata.Envelope.Body.UpdatePosition.Name
xmldata.Envelope.Body.UpdatePosition.Tag
So, the question is: does anyone have any experience with a situation similar to this one? Any recommendation? Maybe I am missing something, a configuration to limits, etc.
Thanks!