Working with large XML's - logstash/elastic can not handle

Hi All,

First time posting so may edit a few times if the formatting sucks.

I have a log file from tibco process that spits out small, medium and large XML messages. filebeat sends the messages to elastic (just to be able to cross reference) as well as logstash.

Logstash is giving me the headache and nowhere else can I find someone who has had a similar problem. So I have tried using the xml plugin but got the message that there is a limit of 1000 fields, so turned to using grok but I still can not get a way to parse the large xml's, the small and medium work 100%, grok times out with error message.

[2017-12-07T07:11:57,413][WARN ][logstash.filters.grok ] Timeout executing grok '%{MY_TIME:my_timestamp}\s*[%{WORD}]\s*[%{LOGLEVEL}\s*]\s*[%{USERNAME:tibco_process}(.\n){2,5}\s%{MY_AUDITCAT}\n\s*%{MY_TIMESTAMP}\n.\n\s%{MY_INTMSGID}\n\s*%{MY_MSGID}\n\s*%{MY_RELATESTO}\n.\n\s%{MY_SENDERID}' against field 'message' with value 'Value too large to output (9813 bytes)! First 255 chars are: 07 Dec 2017 06:55:45,829 [JobCourier0] [INFO ] [WorkTaskOrchestration] bw.logger - <?xml version="1.0" encoding="UTF-8"?>

the BIG xml - even too big to add here.

Basically I only need to structure the xml up to the sender ID, the rest of the mumbo jumbo I dont care about. However I can not find a plugin or method to drop everything after the sender ID and then use my working grok pattern.

07 Dec 2017 06:56:50,643 [JobCourier0] [INFO ] [basicMessageProxy] bw.logger - <?xml version="1.0" encoding="UTF-8"?>
<ns0:AuditEvent xmlns:ns0="http://com/auditing/20120101">
<ns0:Metadata Qualifier="mob-CORE" Key="CALL_STACK" Value="basicMessageProxy"/>
ns0:AuditCategoryAUDIT_ENTRY</ns0:AuditCategory>
ns0:Timestamp2017-12-07T06:56:50.642+02:00</ns0:Timestamp>
ns0:ServiceNamebasicMessageProxy</ns0:ServiceName>
ns0:InternalMessageIdfe40b66f-53bf-4c75-b6cf-a68f947194d6</ns0:InternalMessageId>
ns0:MessageIdfe40b66f-53bf-4c75-b6cf-a68f947194d6</ns0:MessageId>
ns0:RelatesTo190f6e44</ns0:RelatesTo>
ns0:GroupIdentifier190f6e44</ns0:GroupIdentifier>
ns0:SenderIdbasicWOM</ns0:SenderId>

I may have just not found the right filter or this may not be possible, any advise would be appreciated.

I found a solution to my specific problem as I do not need all the XML at his point in time. However in the future more of the XML may be needed.

Use the truncate plugin.

truncate {
fields => "message"
length_bytes => "1200"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.