Xml filter on nested element

Hello,
I am using the jdbc streaming plugin to add a new field to a record. This new field is an array of objects ( history ) as seen below:

    record: {
    	id: "1",
    	someXMLField: <some xml value here>,
    	history: [
    		{
    			fieldA: "sdfsdf"
    			fieldB: "dfdfgfd"
    			fieldC: <some xml value here>
    		},
    		{
    			fieldA: "yyy"
    			fieldB: "fff"
    			fieldC: <some xml value here>
    		}
    	]
    }

I would like to run an xml filter on fieldC and replace that with something like:

 record: {
    	id: "1",
    	someXMLField: <some xml value here>,
    	history: [
    		{
    			fieldA: "sdfsdf"
    			fieldB: "dfdfgfd"
    			fieldCReplacement: {
    				"field1fromXML": "aaa",
    				"field2fromXML": "bbb",
    			}
    		},
    		{
    			fieldA: "yyy"
    			fieldB: "fff"
    			fieldCReplacement: {
    				"field1fromXML": "aaa",
    				"field2fromXML": "bbb",
    			}
    		}
    	]
    }

I can run the xml filter on someXMLField and mutate it to add the desired structure on the main record. I am unable to figure how I can transform like above on a nested array like above.
Any ideas?

You could write a ruby filter that iterates over the [record][history] field and parses the [fieldC] string on each entry. If you read the source for the xml filter then you will see that the store_xml => true code path is actually pretty simple, especially since you can remove most of the option handling and just hard-code the options you want.

Alternatively, you can use a split filter to create multiple events, each of which has a [record][history][fieldC] string, then use an xml filter configured any way you want, including xpath.

You might want to use xpath to extract elements or attributes of interest and save them in [@metadata][fieldC][someElement], then use mutate+copy (not mutate+rename) to copy [@metadata][fieldC] to [record][history][fieldC]. If you really want the replacement field to be [record][history][fieldCReplacement] (i.e. a different name) there is no need to do this and you can use xpath to do the extraction directly.

Once the [record][history] fields look the way you want then you can recombine them using an aggregate filter. This post has an example of how to do that.

I was able to implement a split filter followed by the xml filter to create the structure in a renamed field, as you suggested.

split {
     field: "history"
}
xml {
store_xml => false
remove_namespaces => true
source => "[history][fieldC]"
xpath => [
      "/aa/@attr1", "[history][fieldCReplacement][aatr1name]
      "/aa/@attr2", "[history][fieldCReplacement][aatr2name]
]
}

The only issue i notice is that only the first record in the history array is converted. I lose all other records in the history array.

Without seeing the complete configuration and example data there is no way we can diagnose that. For example, if you are using an elasticsearch output and setting the document_id option so that the id is the same for each member of the array then you will only get one member. But that is just one of an infinite number of possibilities.

Yes, you are correct. The output to elasticsearch does have the document_id setting. And it is the same for each of the history records.
What would be a good option to get around this? ( i will try to get the entire config posted here. will take some time to get that properly done)

You could use an aggregate filter to recombine the members of the array (as in the post I linked to before).

Alternatively, you could modify the document id by appending the array index to it. Off the top of my head I cannot think of a simple way of doing that.

So, I realised that it is not easy to do this. I came up with the following code to try to update the history array:

ruby {
      init => "require 'nokogiri'"
        code => "
              classList ||= []
             for currHistory in event.get('history') do
              if currHistory['fieldC'] != nil

                value = currHistory['fieldC']

                doc = Nokogiri::XML::Document.parse(value, nil, value.encoding.to_s)
                doc.remove_namespaces!
                attr1text = doc.xpath('/RootClass/@attr1').text
                currHistory['fieldCReplacement']['attr1'] = attr1text
                   attr2text = doc.xpath('/RootClass/@attr2').text
                   currHistory['fieldCReplacement']['attr2'] = attr2text
               
              end
            end
            "

    }

Now that I have the values called attr1text and attr2text, I am not sure how I add them as a new field on the individual history array elements.

What did you not like about the method of recombining the history array after the split using an aggregate filter that I linked to?

Sorry, I did not respond to that directly. I did try that out exactly like you instructed. I added a split filter and an xmlfilter to add the new converted field. While that worked partially, the result was that the number of documents that were indexed in elastic multiplied as only one history nested object per record was created. I tried further by removing the documentId attribute as well.
To be fair, I did not present the larger picture on how this was configured ( not sure I can post the whole config, I apologize). But on a high level the way this config is working is:

JDBC Input
-- Filters
-- Aggregate filter to consolidate the records
-- jdbc streaming to add a new nested object array (history) from a different sql
-- update the history array ( one specific field) and expand to a new nested field

So, after trying out a few variations, I thought writing a ruby filter to manually process the history array would help. I looked at the xml filter code to get some ideas.

You are missing the part about recombining the history array using an aggregate filter. Read through the post I linked to.

Oh yes, I did read the post and try it out. So the attempt looked something like this:

JDBC Input
-- Filters
-- Aggregate filter to consolidate the records
-- jdbc streaming to add a new nested object array (history) from a different sql
-- split filter on history array
-- xml filter to transform the field1C to add a new element field1cReplacement
-- Aggregate filter (again)

It did not work for me earlier today. I guess i did not configure that properly. I will attempt this again tomorrow morning and update here.
thanks !!!

I managed to solve the problem. I tried the split/aggregate solution the whole day, but it did not end up working as expected.
But, the first option you presented was a success. A ruby filter, inspired by the post you linked and
this post helped to get the final solution.

	ruby {
      init => "require 'nokogiri'"
        code => "
              oldHistory ||= event.get('history')
              newHistory ||= []
              field1CReplacement = {}
              oldHistory.each { |x|

                if x.include? 'field1C'
                  value = x['event_history_xmlclassification']
                  doc = Nokogiri::XML::Document.parse(value, nil, value.encoding.to_s)
                  doc.remove_namespaces!
				  
                  attr1 = doc.xpath('/xmlRootTag/@attr1').text
                  field1CReplacement['attr1'] ||= attr1
                  attr2 = doc.xpath('/xmlRootTag/@attr2').text
                  field1CReplacement['attr2'] ||= attr2
                 
				  
                  x['field1CReplacement'] = field1CReplacement
                end
                 newHistory << x
              }
              event.set('history', newHistory)
            "
    }

Thanks you for your guidance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.