Xml parsing issue with xpath

hi here !

I am new to use logstash groking sepcially with xml. My task is parsing xml.I successfully wrote the GROK xml filter for a specific data XML format but then i came to know that data (XML) has different formats also.Like some sample data (XML) has new attributes which caused my parsing failure.I want to know two below thing.
1-First i want to know if i grok with match then my XML filter should run.
2-Is there a way in XML filter using xpath to check specific tag is exist or not.

My logstash conf file :

 grok 
	{
	   match => ["message","%{LOGLEVEL:logtype}\s*%{TIMESTAMP_ISO8601:coffii_timestamp}\s*\[%{GREEDYDATA:task_descriptor}\]\s*\[%{WORD:service_name},%{NUMBER:invocation_id},%{GREEDYDATA:jms_message_id},%{GREEDYDATA:esb_conversation_id}\]\s*%{GREEDYDATA:coffii_message}"]
	   match => ["message","%{LOGLEVEL:logtype}\s*%{TIMESTAMP_ISO8601:coffii_timestamp}\s*\[%{GREEDYDATA:task_descriptor}\]\s*\[%{WORD:service_name}?,%{NUMBER:invocation_id}?,%{GREEDYDATA:jms_message_id}?,%{GREEDYDATA:esb_conversation_id}?\]\s*%{GREEDYDATA:coffii_message}\]"]
	   match => ["message","%{LOGLEVEL:logtype}\s*%{TIMESTAMP_ISO8601:coffii_timestamp}\s*\[%{GREEDYDATA:task_descriptor}\]\s*\[%{WORD:service_name}?,%{NUMBER:invocation_id}?,%{GREEDYDATA:jms_message_id}?,%{GREEDYDATA:esb_conversation_id}?\]\s*"]
	   
	}
grok
   {
		match => ["coffii_message", "(?<inxmldata><env:Envelope(.|\r|\n)*)"]
	}
xml 
    {
	   source => ["inxmldata"]
	    target => "parsed_xml"
		store_xml => true
		remove_namespaces => "true"
		xpath => ["/Envelope/Header/Body/ConfirmScheduleRequest/Location_ID/text()","location_id"]
		
		 force_array => false
		
    }
		mutate
		{
			 add_field => {
			  
			   Location_ID => "%{[parsed_xml][Body][ConfirmScheduleRequest][Location_ID]}"
			   Reservation_ID => "%{[parsed_xml][Body][ConfirmScheduleRequest][Reservation_ID]}"
		}
		     remove_field => ["inxmldata","parsed_xml"]
	}

If You see my configuration file , First i groked

match => ["coffii_message", "(?<env:Envelope(.|\r|\n)*)"]

then i want to run XML filter but right now XML filter is running for every format of data (whether xml exist or not).
Secondary I have two xpath tree structure in the XML tags to fetch "Location_ID".
/Envelope/Header/Body/ConfirmScheduleRequest/Location_ID/
/Envelope/Header/Body/QueryScheduleRequest/Location_ID/

I want to know the way in XML grok filter, xpath to check if any tag exist or not.

Thanking in advance,
Kamran

Don't use more than one GREEDYDATA pattern in a single grok expression. It's bad for performance and could result in unexpected results.

First i want to know if i grok with match then my XML filter should run.

You could use a conditional to only run a certain filter if a field exists.

if [fieldname] {
  xml {
    source => "fieldname"
    ...
  }
}

Secondary I have two xpath tree structure in the XML tags to fetch "Location_ID".
/Envelope/Header/Body/ConfirmScheduleRequest/Location_ID/
/Envelope/Header/Body/QueryScheduleRequest/Location_ID/

I want to know the way in XML grok filter, xpath to check if any tag exist or not.

So try to extract both elements to separate fields and use one of the values?

Thanks Magnus for the prompt replay.

It resolved my first issue.Thank you very much but my second issue still not resolved. What I understood, I created two types of xpath successfully but I need only one field (LocationId).In that case I have to delete one field which is empty or null at the end of xml groking. I tried to delete the one who is emty but unable to delete it. Below is my config file after change. Kindly advice with code verification chunk.

if [inxmldata]
{
xml
{
source => ["inxmldata"]
target => "parsed_xml"
store_xml => true
remove_namespaces => "true"
xpath => ["/Envelope/Header/Body/ConfirmScheduleRequest/Location_ID/text()","location_id"]
xpath => ["/Envelope/Header/Body/QueryScheduleRequest/Location_ID/text()","location_id_quesch"]

			 force_array => false
		}
		mutate
		{
			 add_field => 
			 {
			   Location_ID => "%{[parsed_xml][Body][ConfirmScheduleRequest][Location_ID]}"
			   Reservation_ID => "%{[parsed_xml][Body][ConfirmScheduleRequest][Reservation_ID]}"
			   Location_ID_QueSchReq => "%{[parsed_xml][Body][QueryScheduleRequest][Location_ID]}"
			   Reservation_ID_QueSchReq => "%{[parsed_xml][Body][QueryScheduleRequest][Reservation_ID]}"
		    }
			
	    }
		
		if [Location_ID] == ""
			{
			  mutate
				{
			        remove_field => ["inxmldata","parsed_xml","Location_ID","Reservation_ID"]
				 }
			}
			else
			{
		      mutate
		        {
			       remove_field => ["inxmldata","parsed_xml","Location_ID_QueSchReq","Reservation_ID_QueSchReq"]
				}
			}
		
 }

Secondary Can you elaborate GreedyData observation.As i know, we use GreedyData where any type of charactor (Number,Charactor) use.I have used where i found both type in my data.

  • Why are you using xpath when you're adding additional fields from parsed_xml anyway?
  • Are the Location_ID etc fields being created with the correct values?
  • What does an event that gets incorrectly processed look like?

Secondary Can you elaborate GreedyData observation.As i know, we use GreedyData where any type of charactor (Number,Charactor) use.I have used where i found both type in my data.

With more than one DATA or GREEDYDATA pattern in the same expression slight changes in the input could make the expression match differently and not as you'd expect. I can't come up with an example right now but cases like that pop up from time to time here. Secondly, a more exact grok pattern also gives better performance so there's really no reason to over-use those patterns.

Yes xpath was wrong in my last post.I noticed and corrected after the post.I got a workaround.I came to know that i can mutate and replace to change the value.My issue is that i have two fields (Location_id,Reservation_id) with different xpath tags to fetch. Below is my changed code.

if [inxmldata]
{
xml
{
source => ["inxmldata"]
target => "parsed_xml"
store_xml => true
remove_namespaces => "true"
xpath => ["/Envelope/Header/Body/ConfirmScheduleRequest/Location_ID/text()","location_id",
"/Envelope/Header/Body/QueryScheduleRequest/Location_ID/text()","location_id_quesch"]
force_array => false
}
mutate
{
add_field =>
{
Location_ID => "%{[parsed_xml][Body][ConfirmScheduleRequest][Location_ID]}"
Reservation_ID => "%{[parsed_xml][Body][ConfirmScheduleRequest][Reservation_ID]}"
}
convert => [ Location_ID, "integer"]

		}
		if[Location_ID] > 0
		{
			 mutate
			{
				remove_field => ["inxmldata","parsed_xml"]
			 }
		
		}
		else
		{
			mutate
			{
			   remove_field => ["inxmldata","parsed_xml"]
			   replace => 
			   { 
				   "Location_ID" => "%{[parsed_xml][Body][QueryScheduleRequest][Location_ID]}"
				   "Reservation_ID" => "%{[parsed_xml][Body][QueryScheduleRequest][Reservation_ID]}"
			   }
			   
			}
		}
    }

Current logic is if data fetched from "/Envelope/Header/Body/ConfirmScheduleRequest/Location_ID" path then ok other wise i will replace that fetching to /Envelope/Header/Body/QueryScheduleRequest/Location_ID path.Everything is fine but only issue is i need to know that my location_id have valid number value or unparsed string.For this i did the following
convert => [ Location_ID, "integer"]
and check if its greater then 0 then no need to fetch the other xpath format otherwise fetched the other pattern but somehow i am unable to convert String to integer.Kindly advise this logic is right or some other workaround.I don't want to create all four fields because my every field is index.

I repeat: What does an event that gets incorrectly processed look like?

I solved my issue under the light of your guidance.Thank you so much. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.