I can run the xml filter on someXMLField and mutate it to add the desired structure on the main record. I am unable to figure how I can transform like above on a nested array like above.
Any ideas?
You could write a ruby filter that iterates over the [record][history] field and parses the [fieldC] string on each entry. If you read the source for the xml filter then you will see that the store_xml => true code path is actually pretty simple, especially since you can remove most of the option handling and just hard-code the options you want.
Alternatively, you can use a split filter to create multiple events, each of which has a [record][history][fieldC] string, then use an xml filter configured any way you want, including xpath.
You might want to use xpath to extract elements or attributes of interest and save them in [@metadata][fieldC][someElement], then use mutate+copy (not mutate+rename) to copy [@metadata][fieldC] to [record][history][fieldC]. If you really want the replacement field to be [record][history][fieldCReplacement] (i.e. a different name) there is no need to do this and you can use xpath to do the extraction directly.
Once the [record][history] fields look the way you want then you can recombine them using an aggregate filter. This post has an example of how to do that.
Without seeing the complete configuration and example data there is no way we can diagnose that. For example, if you are using an elasticsearch output and setting the document_id option so that the id is the same for each member of the array then you will only get one member. But that is just one of an infinite number of possibilities.
Yes, you are correct. The output to elasticsearch does have the document_id setting. And it is the same for each of the history records.
What would be a good option to get around this? ( i will try to get the entire config posted here. will take some time to get that properly done)
You could use an aggregate filter to recombine the members of the array (as in the post I linked to before).
Alternatively, you could modify the document id by appending the array index to it. Off the top of my head I cannot think of a simple way of doing that.
Sorry, I did not respond to that directly. I did try that out exactly like you instructed. I added a split filter and an xmlfilter to add the new converted field. While that worked partially, the result was that the number of documents that were indexed in elastic multiplied as only one history nested object per record was created. I tried further by removing the documentId attribute as well.
To be fair, I did not present the larger picture on how this was configured ( not sure I can post the whole config, I apologize). But on a high level the way this config is working is:
JDBC Input
-- Filters
-- Aggregate filter to consolidate the records
-- jdbc streaming to add a new nested object array (history) from a different sql
-- update the history array ( one specific field) and expand to a new nested field
So, after trying out a few variations, I thought writing a ruby filter to manually process the history array would help. I looked at the xml filter code to get some ideas.
Oh yes, I did read the post and try it out. So the attempt looked something like this:
JDBC Input
-- Filters
-- Aggregate filter to consolidate the records
-- jdbc streaming to add a new nested object array (history) from a different sql
-- split filter on history array
-- xml filter to transform the field1C to add a new element field1cReplacement
-- Aggregate filter (again)
It did not work for me earlier today. I guess i did not configure that properly. I will attempt this again tomorrow morning and update here.
thanks !!!
I managed to solve the problem. I tried the split/aggregate solution the whole day, but it did not end up working as expected.
But, the first option you presented was a success. A ruby filter, inspired by the post you linked and this post helped to get the final solution.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.