I have xml being received in to logstash via an http input and the message can contain one or more documents that need to be indexed separately which is ok. The part I need help with is the within each document there are 1 or more elements with the same name but different values that need to be put into an array. Below is an example of the xml message:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<crawl-urls>
<crawl-url enqueue-type="reenqueued" forced-vse-key="GROUP_NAME_1" forced-vse-key-normalized="forced-vse-key-normalized" status="complete" synchronization="indexed" url="GROUP_NAME">
<curl-option name="default-allow">allow</curl-option>
<crawl-data content-type="application/vxml-unnormalized" encoding="xml">
<document>
<content name="GroupName">GROUP_NAME_1</content>
<content name="QID">ABCD12</content>
<content name="QID">ABCD13</content>
<content name="QID">ABCD14</content>
</document>
</crawl-data>
</crawl-url>
<crawl-url enqueue-type="reenqueued" forced-vse-key="GROUP_NAME_2" forced-vse-key-normalized="forced-vse-key-normalized" status="complete" synchronization="indexed" url="GROUP_NAME">
<curl-option name="default-allow">allow</curl-option>
<crawl-data content-type="application/vxml-unnormalized" encoding="xml">
<document>
<content name="GroupName">GROUP_NAME_2</content>
<content name="QID">ABCD22</content>
<content name="QID">ABCD23</content>
<content name="QID">ABCD24</content>
<content name="QID">ABCD25</content>
</document>
</crawl-data>
</crawl-url>
</crawl-urls>
The output needs to be:
[{“GroupName” : “GROUP_NAME_1”},{“QID”:[“ABCD12”,”ABCD13”,”ABCD14”]}]
[{“GroupName” : “GROUP_NAME_2”},{“QID”:[“ABCD22”,”ABCD23”,”ABCD24”,”ABCD25”]}]