Configuring Logstash to index nested xml data

With Logstash, is there a way to index nested serialized XML using the XML plugin? Can you provide an example?

We have some data that we migrate from an existing SQL database via Logstash. These records have a handful of fixed well-defined fields (time_stamp, event_type, call_id, host_server, etc.). There is one field call “data” and this one has xml data, and this xml info and attributes vary based on the “event_type” field.

Using the xml plugin, I created a filter in my Logstash configuration and it’s generally working. We have additional fields in some records; they’re properly populated, and I can search on them.

In my Logstash config file I use:

filter {
xml {
source => "data"
target => "data_xml"
store_xml => true
}
}

However, one event type indicating an event based on an internal web service call returns this data_xml fine, and one new field is called “data_xml.Data_Out”. This is “Data_Out” represents the returned data from the service call and is another set of xml data. I want to index this Data_Out as well.

But note that when viewing the original “Data” field, we see the individual xml files including “Data_Out”, but the content of this “Data_Out” is like serialized XML, with “& l t ;” and “& g t ;” imbedded in the Data_Out content. However, the data_xml.Data_Out field shows the string including the “<” and “>” characters. I assume I can index this as well but so far, I’ve not been successful.

We have many such back-end systems and 10’s to 100’s of method calls for each. There’s a large number of result set patterns in this “Data_Out” set so I’m looking for Logstash to index these fields as they’re ingested.

Assuming this can be done using the same index configuration xml filter, I want to continue to index Data => data_xml, but additional index data_xml.Data_Out => data_xml.data_out_xml. The result may logically seem to also be under the “data_xml” set, but I’m not committed to it.

While I have searched these boards and found topics pertaining to "nested xml" I'm not seeing one with the serialized "& l t;" info, or including a sample of how the Logstash filter is to be set up.

It may well be my lack of knowledge/experience and I am a new to ELK, but I think an example would go a long way to explain how it's done. Can you offer a suggestion how I’m supposed to configure the filter to do this? Or, is there some other way it should be done?

Thank you

To clarify, when we index the Data field, we are deserializing the data_xml.Data_Out field, so it now has the proper lt and gt characters/ I'm looking to index these as new fields as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.