How to grok complex patterns with logstash 7.1

HI. I am aware that DATA and GREEDYDATA are expensive patterns but those are most simple to use. I am using following pattern to extract data out of JMS XML . I am pasting both Input as well as Pattern which I have configured . Please Help me to build optimum pattern. Tried all possible patterns to reduce use of DATA and GREEDYDATA but nothing worked.

Input :-
####<04-Feb-2020 18:20:07 o'clock GMT> <> <1580840407665> <929588> <ID:<360433.1580840407631.0>> <> <com.my.capact.dl.jms.ipub.jmsmodule.NIAB-NPMD-DL-01!com.my.capact.dl.jms.ipub.mli.notification.NIAB-NPMD-DL-01> <> <> <<?xml version="1.0" encoding="UTF-8"?><mes:WLJMSMessage xmlns:mes="http://www.bea.com/WLS/JMS/Message"><mes:Header><mes:JMSTimestamp>1580840407631</mes:JMSTimestamp><mes:Properties/></mes:Header><mes:Body><mes:Text><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:m="http://capabilities.nat.my.com/xsd/ManageEventNotification/2006/12/30" xmlns:m0="http://wsi.nat.my.com/2005/06/StandardHeader/" xmlns:m1="http://capabilities.nat.my.com/xsd/ManageEventNotification/2006/12/30/CCM/Events"><soapenv:Body><m:notify><m0:standardHeader><m0:e2e><m0:E2EDATA>E2E.busTxnStage=NOT,E2E.compTxnName=P1,E2E.compTxnID=4p3zpoljmy,E2E.from=IPUB-ROmy,E2E.to=MQREP,E2E.graphID=1.1.1.1,E2E.threadID=4f38vq5djz,E2E.busProcType=notify,E2E.busProcOriginator=NMDB-MLI,E2E.threadID.1=:,E2E.busTxnType=MENNotifications,E2E.busTxnHdr=PCK002069,E2E.busTxnSys=NMDB_MENNotifica,E2E.busTxnLoc=UNKNOWN,E2E.busTxnUsr=wbrkadm,E2E.busTxnSeq=4f38vq5co9</m0:E2EDATA></m0:e2e><m0:serviceState><m0:stateCode>OK</m0:stateCode><m0:errorCode>0</m0:errorCode><m0:errorDesc/><m0:errorText>0</m0:errorText></m0:serviceState><m0:serviceAddressing><m0:from>http://capabilities.nat.my.com/ManageEventNotification/2006/12/30</m0:from><m0:to><m0:address>java://com.my.capact.dl.jms.ipub.mli.notification.NIAB-NPMD-DL-01@com.my.capact.dl.jms.ipub.cf.NIAB-NPMD-DL-01</m0:address></m0:to><m0:messageId/><m0:serviceName>http://capabilities.nat.my.com/ManageEventNotification/2006/12/30</m0:serviceName><m0:action>http://capabilities.nat.my.com/ManageEventNotification/2006/12/30#notify</m0:action></m0:serviceAddressing><m0:serviceSpecification><m0:payloadFormat>XML</m0:payloadFormat><m0:version>1.0</m0:version><m0:revision/></m0:serviceSpecification></m0:standardHeader><m:notificationMessage><m1:laxTopicValidation>false</m1:laxTopicValidation><m1:subscriptionReference><m1:address>uuid:f31a19f4-0a9d-11ea-9fc2-0abba92a0000</m1:address></m1:subscriptionReference><m1:message><mlidata:inventory xmlns:ssp1="https://collaborate.my.com/svn/edm/ssp/trunk/UnstructuredAddress.xsd" xmlns:ssp="https://collaborate.my.com/svn/edm/ssp/trunk/Address.xsd" xmlns:adli="https://collaborate.my.com/svn/edm/adli/IPAddress" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:head="http://wsi.nat.my.com/2005/06/StandardHeader/" xmlns:mlidata="http://collaborate.my.com/svn/sdkrepo/pbp/MLI/tags/8/ManageLogicalInventoryData/" xsi:schemaLocation="http://collaborate.my.com/svn/sdkrepo/pbp/MLI/tags/8/ManageLogicalInventoryData/ ManageLogicalInventoryData.xsd"><head:standardHeader><head:e2e><head:E2EDATA/></head:e2e><head:serviceState><head:stateCode>OK</head:stateCode></head:serviceState><head:serviceAddressing><head:from>http://capabilities.nat.my.com/ManageLogicalInventory/APP06104</head:from><head:to><head:address>http://capabilities.nat.my.com/ManageLogicalInventory/NIAB-NPMD-DL-01</head:address></head:to><head:replyTo><head:address>http://capabilities.nat.my.com/ManageLogicalInventory/APP06104</head:address></head:replyTo><head:messageId>145498609</head:messageId><head:serviceName>http://capabilities.nat.my.com/ManageLogicalInventory</head:serviceName><head:action>inventoryNotification</head:action></head:serviceAddressing><head:serviceSpecification><head:payloadFormat>XML</head:payloadFormat><head:version>8.0</head:version><head:revision/></head:serviceSpecification></head:standardHeader><mlidata:logicalInventory messageType="Notification"><mlidata:numberOfRecords>1</mlidata:numberOfRecords><mlidata:startIndex>1</mlidata:startIndex><mlidata:managementDomains><mlidata:managementDomain><mlidata:name><mlidata:rdn><mlidata:type>AID</mlidata:type><mlidata:value/></mlidata:rdn></mlidata:name><mlidata:id/><mlidata:managedElements><mlidata:managedElement><mlidata:id>1064145334</mlidata:id><mlidata:aliasNames><mlidata:aliasName><mlidata:key>BFGDeviceId</mlidata:key><mlidata:value>19987711</mlidata:value></mlidata:aliasName><mlidata:aliasName><mlidata:key>NetworkName</mlidata:key><mlidata:value>EU-Switzerland</mlidata:value></mlidata:aliasName><mlidata:aliasName><mlidata:key>SysObjectId</mlidata:key><mlidata:value/></mlidata:aliasName></mlidata:aliasNames><mlidata:userLabel/><mlidata:owner>NMDB</mlidata:owner><mlidata:namingOS/><mlidata:source>NETWORK_EMS</mlidata:source><mlidata:resourceState>INSTALLED</mlidata:resourceState><mlidata:resourceDates><mlidata:resourceDate><mlidata:dateType>ACTUAL_DATE</mlidata:dateType><mlidata:resourceState>INSTALLED</mlidata:resourceState><mlidata:dateTime>2018-01-23T15:29:48.0Z</mlidata:dateTime></mlidata:resourceDate><mlidata:resourceDate><mlidata:dateType>LAST_MODIFIED_DATE</mlidata:dateType><mlidata:resourceState>INSTALLED</mlidata:resourceState><mlidata:dateTime>2020-02-04T18:16:53.0Z</mlidata:dateTime></mlidata:resourceDate></mlidata:resourceDates><mlidata:operationalState>Y</mlidata:operationalState><mlidata:probes/><mlidata:logicalLocation><mlidata:siteId>3681373</mlidata:siteId><mlidata:name>EU-CH-ZUTH</mlidata:name><mlidata:addressRef><ssp:addressKey/><ssp:street/><ssp:city/><ssp:countyStateProvince/><ssp:country/><ssp:postCode/><ssp:dateTimeCreated/>lt;/mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>REPORT_SERVICE_LEVEL</mlidata:key><mlidata:value/></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>DeviceCategory</mlidata:key><mlidata:value>Managed</mlidata:value></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>DeviceStatus</mlidata:key><mlidata:value>In Service</mlidata:value></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>SerialNumber</mlidata:key>mlidata:value/></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>NTN_SSV_ID</mlidata:key><mlidata:value>2949540</mlidata:value></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>CEASED_DATE</mlidata:key><mlidata:value/></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>CollectedSerialNumber</mlidata:key><mlidata:value/></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>HsrpIpAddress</mlidata:key><mlidata:value/></mlidata:additionalInfo><mlidata:additionalInfo><mlidata:key>CustomerHostName/mlidata:additionalInfo></mlidata:meVendorExtensions><mlidata:managementIPAddresses><mlidata:ipAddress><adli:ipName/><adli:ipValue/><adli:dnsName>abc-ch-zuth-as01</adli:dnsName><adli:dnsNameAlias/><adli:subnetworkMask/><adli:ippool/><adli:ipType>Management</adli:ipType></mlidata:ipAddress><mlidata:ipAddress><adli:ipName/><adli:ipValue>10.20.30.40</adli:ipValue><adli:dnsName/><adli:dnsNameAlias/><adli:subnetworkMask/><adli:ippool/><adli:ipType>Customer</adli:ipType></mlidata:ipAddress><mlidata:ipAddress><adli:ipName/><adli:ipValue/><adli:dnsName/><adli:dnsNameAlias/><adli:subnetworkMask/><adli:ippool/><adli:ipType>CustomerIPv6</adli:ipType></mlidata:ipAddress></mlidata:managementIPAddresses></mlidata:managedElement></mlidata:managedElements></mlidata:managementDomain></mlidata:managementDomains></mlidata:logicalInventory></mlidata:inventory></m1:message><m1:topic><m1:name>http://itprogrammes.intra.my.com/topics:/my_GS/MLI/Device</m1:name></m1:topic><m1:defaultPersistence>true</m1:defaultPersistence><m1:creationTime>v_soap_date1</m1:creationTime><m1:timeToLive xsi:nil="true"/><m1:producerReference><m1:address>NMDB</m1:address></m1:producerReference></m:notificationMessage></m:notify></soapenv:Body></soapenv:Envelope></mes:Text></mes:Body></mes:WLJMSMessage>> <>

Pattern I have created is :-
<%{DATA:DateTime}>%{SPACE:thrash}<%{DATA:thrash}ID:<%{DATA:jms_message_id}>%{DATA:thrash}!%{DATA:jms_destination}> <%{DATA:jms_message_status}>%{DATA:thrash}soapenv%{DATA:thrash}mlidata:inventory%{DATA:thrash}ManageLogicalInventoryData.xsd%{DATA:thrash}messageId>%{DATA:mliMessageId}&lt%{DATA:thrash}head:serviceSpecification%{DATA:thrash}mlidata:numberOfRecords%{DATA:thrash}mlidata:managedElements%{DATA:thrash}BFGDeviceId%{DATA:thrash}value>%{DATA:mliDeviceBFGId}&lt%{DATA:thrash}SysObjectId%{DATA:thrash}NETWORK_EMS%{DATA:thrash}LAST_MODIFIED_DATE%{DATA:thrash}mlidata:logicalLocation%{DATA:thrash}mlidata:meVendorExtensions%{DATA:thrash}BFGProductType%{DATA:thrash}CustomerName%{DATA:thrash}mlidata:value>%{DATA:mliDeviceCustomerName}&lt%{DATA:thrash}REPORT_SERVICE_LEVEL%{DATA:thrash}DeviceStatus%{DATA:thrash}value>%{DATA:mliDeviceFlowStatus}&lt%{DATA:thrash}NTN_SSV_ID%{DATA:thrash}CustomerHostName%{DATA:thrash}mlidata:managementIPAddresses%{DATA:thrash}dnsName>%{DATA:mliDevice_Hostname}&lt%{DATA:thrash}ipAddress%{DATA:thrash}ipAddress><adli:ipName%{DATA:thrash}ipValue>%{DATA:mliDeviceIPAddress}&lt

You really need to learn enough markdown to format your posts if you expect folks to read them. Specifically code quoting. A few minutes work that will pay many dividends.

Oh yes .. just wanted to be more elaborative so that real problem understood. Hope some-one helps me on above.

HI,
Will some one please help me on this. I am trying to figure out appropriate grok pattern. Logstash throws below message

Timeout executing grok '<%{DATA:DateTime}>%{SPACE:thrash}<%{DATA:thrash}ID:<%{DATA:jms_message_id}>%{DATA:thrash}!%{DATA:jms_destination}> <%{DATA:jms_message_status}>%{DATA:thrash}soapenv%{DATA:thrash}mlidata:inventory%{DATA:thrash}ManageLogicalInventoryData.xsd%{DATA:thrash}messageId>%{DATA:mliMessageId}&lt%{DATA:thrash}head:serviceSpecification%{DATA:thrash}mlidata:numberOfRecords%{DATA:thrash}mlidata:managedElements%{DATA:thrash}BFGDeviceId%{DATA:thrash}value>%{DATA:mliDeviceBFGId}&lt%{DATA:thrash}SysObjectId%{DATA:thrash}NETWORK_EMS%{DATA:thrash}LAST_MODIFIED_DATE%{DATA:thrash}mlidata:logicalLocation%{DATA:thrash}mlidata:meVendorExtensions%{DATA:thrash}BFGProductType%{DATA:thrash}CustomerName%{DATA:thrash}mlidata:value>%{DATA:mliDeviceCustomerName}&lt%{DATA:thrash}REPORT_SERVICE_LEVEL%{DATA:thrash}DeviceStatus%{DATA:thrash}value>%{DATA:mliDeviceFlowStatus}&lt%{DATA:thrash}NTN_SSV_ID%{DATA:thrash}CustomerHostName%{DATA:thrash}mlidata:managementIPAddresses%{DATA:thrash}dnsName>%{DATA:mliDevice_Hostname}&lt%{DATA:thrash}ipAddress%{DATA:thrash}ipAddress><adli:ipName%{DATA:thrash}ipValue>%{DATA:mliDeviceIPAddress}&lt' against field 'message' with value 'Value too large to output (13725 bytes)! First 255 chars are: ####<11-Feb-2020 07:45:10 o'clock GMT> <> <1581407110949> <562039> <ID:<360433.1581407110917.0>> <> <com

HI,
I replaced DATA with regex (.*?) to reach the next token as fast as Grok can, but logstash gives me error Value too large to output (15440 bytes)! First 255 chars are
Please help me to configure right grok parser.

I tried couple of things to avoid DATA elements.. seems some improvement but problem still persist. Not sure this forum is really helpful
No point in tracking and expecting good suggestions here !!!!! For new guys who visit this portal .. i think try your own don't linger around this webpages .. they hardly are useful :frowning :frowning:

have you tried using the xml filter instead of grok?

If that is not an option the dissect filter might be a good fit too as you seem to have structured your grok based on delimiters.

Thanks for suggestion Christian .. I haven't tried xml nor dissect filter. To be honest, I was not aware of it. I will try, if you can provide some link to samples, I will refer to them.

Thanks
Keshav

I would recommend looking at the documentation as I do not have any examples as I almost never have to deal with XML data.

HI Christain,
After going through documentation I created my own pattern to dissect the message, I have also set "config.support_escapes" to true, but no luck. PFB screenshot of the dissect message, surely something I am missing but unable to trace as there is no online debugger to verify the pattern of dissect. If you can guide me to right direction I will invetigate it further. I know the screenshot is too big to go through but it holds details of message as well as pattern

Your dissect pattern does not match the message. You have a very complicated pattern, so I would develop it one field at a time. Start by matching the first field on the line. Do not try to match anything more than that. Start a copy of logstash with

--config.reload.automatic

enabled. That way you only pay the startup cost once, and it will reload the configuration and reinvoke the pipeline each time you modify the configuration. Create a file that contains your log file line then start with

input { 
    file {
        path => "C:/some/path/log.txt"
        start_position => "beginning"
        sincedb_path => "nul"
    }
}
filter { dissect { mapping => { "message" => "####<%{DateTime}>%{}" } } }
output { stdout { codec => rubydebug } }

The trailing %{} in the pattern is needed to consume (and discard) the rest of the message. Once you see a good value for DateTime in the rubydebug output, add the next field to the pattern (or two if you feel lucky). Once you write out the configuration file from your editor logstash will notice a couple of seconds later and restart the pipeline and print out a rubydebug event with (hopefully) some more fields on the event. Repeat this as you perfect the pattern for each field.

It may sound like a lot of effort, but it is actually much quicker than staring at a complicated pattern that doesn't quite match. This way you only ever have to worry about extending the match by one field.

If you end up having to go back to grok then this post describes a similar technique for grok.

1 Like

Thanks I will take a look into it and try pattern by pattern by adding one element each time.

The way you have suggested have worked to form the Correct Dissect pattern. I have implemented it in production after testing it, will monitor it for couple of days and verify whether dissect is not throwing up any errors like grok did "Value too large to output"

Below is my final pattern which worked for 3 messages received in production so far.

:smile: thanks for all your suggestion so far

Note that if you do not want to store a field you do not have to name it. You could replace %{thrash1} with %{} instead of doing the remove_field.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.