Dynamic xpath creation by parsing fields in Xml filter

Below is the sample xml

<xml-content>
 <url-group>
    <item id="-971421122">
      <name>http://10.115.88.123:88/category/view</name>
    </item>
</url-group>
 <issue-group total="1">
    <item id="-5287023434033995264">
        <url>
        <ref>-971421122</ref>
      </url>
    </item id>
</issue-group>
<xml-content>

From the above xml snippet i am using split on issue-group, and need the url-name to create events to have (id, urlname) based on the ref id which is the common field

Please keep in mind there will be multiple items.

I tried the following way

filter :::
split
{
field => "issueitem"
}
xml
{
source => "issueitem"
store_xml => false
xpath => [
"/item/@id","issueid",
"/item/url/ref/text()","urlid"
]
}
xml
{
source => "urlgroup"
store_xml => false
xpath => [
"/url-group/item[@id='%{urlid}']/name/text()" ,"urlname"
]
}

if it is static value like the below,
/url-group/item[@id='-971421122']/name/text() it is working , Please help

Please keep in mind there will be multiple items corresponding id's, each event should have individual id and urlname .

@magnusbaeck please help

Please do not ping people not already involved in the thread. This forum is manned by volunteers, so please also be patient. If you have not received any response within a few business days it is usually considered OK to bump your thread.

As far as I can see that does not look like valid XML. Where are the closing tags for xml-content, url-group and issue-group?

Updated christian , Hope you have understood the problem

What do you expect the generated documents to look like?

In the xpath how can i pass the %{urilid}
the below is not working
/url-group/item[@id='%{urlid}']/name/text()

/url-group/item[@id='-971421122']/name/text()

That does not really answer my question. What would the document(s) generated from the sample data above look like?

I need the urlname also in the event, can u see the updated question

I still do not understand what you want the output to look like.

I corrected a few things in your sample and ran it t through this config:

input {
  generator {
    lines => ['<xml-content>
 <url-group>
    <item id="-971421122">
      <name>http://10.115.88.123:88/category/view</name>
    </item>
</url-group>
 <issue-group total="1">
    <item id="-5287023434033995264">
        <url>
        <ref>-971421122</ref>
      </url>
    </item>
</issue-group>
</xml-content>']
    count => 1
  } 
} 

filter {
  xml {
    source => "message"
    target => "data"
    remove_field => ["message"]
  }
}

output {
  stdout { codec => json_lines }
}

The result was:

{
	"host": "localhost",
	"@timestamp": "2019-01-08T17:44:50.089Z",
	"@version": "1",
	"sequence": 0,
	"data": {
		"url-group": [{
			"item": [{
				"id": "-971421122",
				"name": ["http://10.115.88.123:88/category/view"]
			}]
		}],
		"issue-group": [{
			"item": [{
				"id": "-5287023434033995264",
				"url": [{
					"ref": ["-971421122"]
				}]
			}],
			"total": "1"
		}]
	}
}

How does this compare to what you want inserted into Elasticsearch?

You still do not have valid XML. If your XML looked like this:

<xml-content>
    <url-group>
        <item id="-37">
            <name>first</name>
        </item>
        <item id="-38">
            <name>second</name>
        </item>
    </url-group>
    <issue-group total="1">
        <item id="-3">
            <url>
                <ref>-37</ref>
            </url>
        </item>
        <item id="-22">
            <url>
                <ref>-38</ref>
            </url>
        </item>
    </issue-group>
</xml-content>

Then if the entire text is in a single event then this would work

xml { source => "message" target => "theXML" }
split { field => "[theXML][url-group][0][item]" }
split { field => "[theXML][issue-group][0][item]" }
if [theXML][url-group][0][item][id] != [theXML][issue-group][0][item][url][0][ref][0] { drop {} }

That would get you two events, one of which has name=first and id="-22", and the other has name=second and id="-3". But you are forcing us to guess, which is not good.

Thanks Badger , You are closer , let me put this way , as i am newbie , thanks for your patience

<xml-content>
    <url-group>
        <item id="-37">
            <name>first</name>
        </item>
        <item id="-38">
            <name>second</name>
        </item>
    </url-group>
    <issue-group total="1">
        <item id="-3">
			severity>low</severity>
		    <remediation>
				<ref>fix_52741</ref>
			</remediation>
            <url>
                <ref>-37</ref>
            </url>
        </item>
        <item id="-22">
			severity>high</severity>
			<remediation>
				<ref>fix_52742</ref>
			</remediation>
            <url>
                <ref>-38</ref>
            </url>
        </item>
    </issue-group>
</xml-content>

1,The number of events will be equal to the items in issue-group.
2,event should contain (issue-group) fields like severity, remediation/ref, url/ref and also the url-group corresponding (name)
so from the above
2 events will be created ::: 1st event (severity = low, remediation = fix_52741, url = -37, name =first) , 2nd evennt (severity = high, remediation = fix_52742, url = -38, name =second)

Once you add an opening < to the severity tags, the filter I gave you will produce

    "theXML" => {
    "issue-group" => [
        [0] {
             "item" => {
                "remediation" => [
                    [0] {
                        "ref" => [
                            [0] "fix_52742"
                        ]
                    }
                ],
                   "severity" => [
                    [0] "high"
                ],
                         "id" => "-22",
                        "url" => [
                    [0] {
                        "ref" => [
                            [0] "-38"
                        ]
                    }
                ]
            },
            "total" => "1"
        }
    ],
      "url-group" => [
        [0] {
            "item" => {
                  "id" => "-38",
                "name" => [
                    [0] "second"
                ]
            }
        }
    ]
},

You can use mutate+rename to move those fields up to the top level. Stuff like

mutate { rename => { "[theXML][issue-group][0][item][remediation][0][ref][0]" => "remediation" } }

2 events will be created ::: 1st event (severity = low, remediation = fix_52741, url = -37, name =first) , 2nd evennt (severity = high, remediation = fix_52742, url = -38, name =second)

So to achieve this , you want me to add both split and rename as mentioned in your earlier replies right

Yes, use

xml { source => "message" target => "theXML" }
split { field => "[theXML][url-group][0][item]" }
split { field => "[theXML][issue-group][0][item]" }
if [theXML][url-group][0][item][id] != [theXML][issue-group][0][item][url][0][ref][0] { drop {} }

mutate { rename => { "[theXML][issue-group][0][item][remediation][0][ref][0]" => "remediation" } }

Of course you will need additional entries in the rename hash to deal with the other fields/

[2019-01-09T00:32:29,586][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[theXML][url-group][0][item] is of type = NilClass
[2019-01-09T00:32:29,587][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[theXML][url-group][0][item] is of type = NilClass
[2019-01-09T00:32:29,587][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[theXML][issue-group][0][item] is of type = NilClass
[2019-01-09T00:32:29,587][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[theXML][issue-group][0][item] is of type = NilClass

Split failure , the fields are not visible in the event

What do you get if you use

output { stdout { codec => rubydebug } }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.