Issue in using xpath in xml filter

Hi,

I am just beginning to learn ELK, Grok etc.

I recently installed Elastic Stack (ELK and Filebeat) to monitor RedHat JBoss Fuse log files.

My goal is to create new fields by applying Grok's xml filter and extract its value and put it into the field by mutate>replace method.

The GREEDYDATA:messagetext may contain different xml data depending on web services being called. Sometimes the fields will exist in header, sometimes in body, etc. Sometimes the data may contain soap message, sometimes plain in xml format.

Giving example below for one field "EAITransDate". Looking for "transactionDate" in messagetext and get its text value and put it into EAITransDate field.

By right Kibana should show some value in EAITransDate field, but it is blank. It does not show anything.

Anything wrong with this xpath => ["//*[local-name()='transactionDate']/text()","EAITransDate"]?

I event tried xpath => ["//transactionDate/text()","EAITransDate"] but its not working also.

Need help/advise.

The Grok filter I am using is given below.

filter {

  grok {      
      match => { 
        message => "%{TIMESTAMP_ISO8601:logdate}%{SPACE}\|%{SPACE}%{LOGLEVEL:level}%{SPACE}\|%{SPACE}%{DATA:thread}%{SPACE}\|%{SPACE}%{DATA:category}%{SPACE}\|%{SPACE}%{DATA:bundle}%{SPACE}\|%{SPACE}%{GREEDYDATA:messagetext}"
      }	 
    }	
	mutate {
	
		add_field => {"EAITransDate" => ""}
		add_field => {"EAITransTime" => ""}
		add_field => {"EAITransType" => ""}
		. more fields
		. some more fields
		. some more fields
	}
	
	if "transactionDate" in [messagetext]{
		xml{ 
		   source    => "messagetext"
		   store_xml => "false"
		   xpath     => ["//*[local-name()='transactionDate']/text()","EAITransDate"]
		}
		
		mutate {
		  replace => {
			  "EAITransDate" => "%{EAITransDate}"				  
		  }             			  
		}	
	}
}

Sample log line given below.... The greedydata varies depending on service being called by front end. Sometimes the field name is spelt differently or in different case (lower case, upper case).

2019-01-07 20:10:44,831 | INFO  | qtp1223493273-50 | InquiryServices           | 981 - org.apache.cxf.cxf-core - 3.0.4.redhat-621084 | Inbound Messagen  | ----------------------------n  | ID: 32850n  | Address: http://localhost:9000/cxf/AccountInquiryServices?wsdln  | Encoding: UTF-8n  | Http-Method: POSTn  | Content-Type: text/xml; charset=UTF-8n  | Headers: {Accept=[application/soap+xml,multipart/related,text/*], accept-encoding=[gzip,deflate], ARM_CORRELATOR=[DoNotTraceLowerProtocol], connection=[keep-alive], content-type=[text/xml; charset=UTF-8], Host=[localhost:9000], IBM-WAS-CLIENT=[TRUE], SAVECONNECTION=[13425254451546863044797], SOAPAction=[""], transfer-encoding=[chunked], User-Agent=[IBM WebServices/1.0], wsdl=[]}n  | Payload: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.eai.beans.com">
n  |    <soapenv:Header/>
n  |    <soapenv:Body>
n  |       <ws:InqRequest>
n  |       	  <applId>ML123</applId>
n  | 		  <applName>ML123</applName>
n  | 		  <applTransId>30245</applTransId>
n  | 		  <applUserId />
n  | 		  <bankCode>1236</bankCode>
n  | 		  <branchNumber>0987</branchNumber>
n  | 		  <custNo>15243636</custNo>
n  | 		  <transactionCode>XYZ001</transactionCode>
n  | 		  <transactionDate>28012019</transactionDate>
n  | 		  <transactionTime>152312</transactionTime>
n  | 		  <transactionUserId>NEWUSERID</transactionUserId>
n  | 		  <transactionUserInfo />
n  | 		  <versionNumber>1</versionNumber>
n  |   		<versionNumber>1</versionNumber>
n  |       </ws:InqRequest>
n  |    </soapenv:Body>
n  | </soapenv:Envelope>n  | --------------------------------------

Thank you.

Where did the n character come from? Are they in the original log text?

In your grok pattern, you have 5 %{SPACE}\|%{SPACE} sections but I count 13 such character sequences before the 'Payload: text which is where the xml starts from. Further, the n | inside the xml won't parse. The xpath may be correct.

It looks to me like you have KV data between the two ---------------------------- "begin/end" texts.

You need to:

  1. clean the original string of the n characters (if necessary) using mutate/gsub
  2. update the grok pattern to isolate the KV data
  3. use the KV filter to split the kv data into fields like Content-Type and Payload
  4. use the xml filter to pluck data from the xml text in the Payload field.

Thank you for the response.

Yes, the n is part of original text (i.e. greedydata) and greedydata's content varies from service to service. There are many services and content will not be consistent. This data may or may not have soapenv element.

The data after the 5th %{SPACE}\|%{SPACE} section is supposed to be greedydata (i.e. application supplied message) according to the Log4j's pattern layout being used.

The %m%n below is basically the greedydata.

Blockquote

File appender

log4j.appender.out=org.apache.log4j.RollingFileAppender
log4j.appender.out.layout=org.apache.log4j.SanitizingPatternLayout
log4j.appender.out.layout.replaceRegex=\n
log4j.appender.out.layout.replacement=\\n |\u0020
log4j.appender.out.layout.ConversionPattern = %d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundle.name} - %X{bundle.version} | %m%n

Blockquote

I tested below code online (https://www.freeformatter.com/) just to verify if my xpath syntax is correct or not. It works.

Blockquote

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.eai.beans.com">
n | soapenv:Header/
n | soapenv:Body
n | ws:InqRequest
n | ML123
n | ML123
n | 30245
n |
n | 1236
n | 0987
n | 15243636
n | XYZ001
n | 28012019
n | 152312
n | NEWUSERID
n |
n | 1
n | 1
n | </ws:InqRequest>
n | </soapenv:Body>
n | </soapenv:Envelope>

Blockquote

For testing, I extracted only the soapenv part of data. It seems the n | inside the xml is being successfully parsed and I can get the date value. (28012019). Does it mean xml filter in logstash is not able to parse the content if it has n | ?

I am not sure what you mean by KV data between two ----------.

My goal is to generalize the greedydata content from various services, parse it in xml filter, create new fields and extract its data using xpath and set the value to newly created field.

sample data for one of the service which does not have soap env.

2019-01-07 20:12:01,767 | INFO | inquiry_jms] | inquiry-main-route | 960 - org.apache.camel.camel-core - 2.15.1.redhat-621084 | Message received from Queue : <Msg><Header><EAIMsgVerNo>1</EAIMsgVerNo><EAITransType>MXD123</EAITransType><EAIApplName>BVC</EAIApplName><EAIApplID>NVC</EAIApplID><EAIApplTransID>XYZ</EAIApplTransID><EAITransDate>070119</EAITransDate><EAITransTime>201048</EAITransTime><EAITransUserID>MDC</EAITransUserID><EAITransUserInfo>MDC</EAITransUserInfo><EAIApplUserID>AGBANKING</EAIApplUserID><EAIBankCode>10</EAIBankCode><EAIBranchNo>1111</EAIBranchNo><EAIControlUnit>WB</EAIControlUnit></Header><Body><NameICInq><Request><TellerID>GDX</TellerID><JournalSeq>0</JournalSeq><CASAAccNo>1002171000012945</CASAAccNo><AccType>C</AccType><CurrCode>USD</CurrCode></Request></NameICInq></Body></Msg>

Appreciate if you can help.

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.