Strange behaviour kv field_split

Hi All,
I'm trying to parse a log with the character "|" as split field value. It works fine for all fields but I get extra fields which are really broken and consists of multiple field-value-pairs, e.g.

  • 01-01015173|url=http.keyword

  • BP 437 "Wohnen am Klucksgraben - Erschließung EFH-Gebiet|url=https.keyword

  • AIRF)|pComp=CAV|riskType=1|srcZone=0dstZone=1|detectionType=1|act=not blocked|threatType=1|hostName=www.proxylists.net|cnt=1|aggregatedCnt=1|cccaDestinationFormat=URL|cccaDetectionSource=GLOBAL_INTELLIGENCE|cccaRiskLevel=2|cccaDestination=http

My config is as follows:

filter
{
    if [type] == "test1"
    {
        mutate {
            gsub => ["message", "\t", "|"]
        }
        kv {
            source => "message"
            value_split => "="
            field_split => "|"
        }
    }
}

I also found warnings in logstash logs that says:
[WARN ][logstash.filters.kv ][main] Exception while parsing KV {:exception=>"Invalid FieldReference: THO@UNER.DE|suser=UC@UNER.DE|mailMsgSubject=[TURM] test|url=https"}

Could you help me please?

Hi

Could you please post a sample of your input as seen by the filter? On way to do it is deactivate the filter and post the ouptut of stdout{}.

Hello again,

here are some samples. I deactivated the filter like you described it. (I edited sensitive information into useless strings but kept the syntax).

<158>LEEF:1.0|Trend Micro|Deep Discovery Inspector|5.5.1200|SECURITY_RISK_DETECTION|devTimeFormat=MMM dd yyyy HH:mm:ss z	ptype=IDS	dvc=111.111.111.111	deviceMacAddress=11:11:11:11:11:11	dvchost=NAME1	deviceGUID=1A1A1A1A1A1A-1A1A1A1A-1A1A-1A1A-1A1A	devTime=Feb 17 2020 12:21:41 GMT+01:00	sev=2	protoGroup=SQL	proto=MSSQL	vLANId=1111	deviceDirection=1	dhost=name-name-1111	dst=111.111.111.111	dstPort=1114	dstMAC=11:11:11:11:11:11	shost=name.domain.de	src=111.111.111.111	srcPort=1113	srcMAC=11:11:11:11:11:11	malType=OTHERS	fileType=-65536	fsize=0	ruleId=504	msg=Successful log on to service	deviceRiskConfidenceLevel=3	pComp=NCIE	riskType=1	srcGroup=Name Network	srcZone=1	dstZone=0	detectionType=1	act=not blocked	threatType=2	interestedIp=111.111.111.111	peerIp=111.111.111.111	dUser1=domain\h1111	cnt=1	aggregatedCnt=1	evtCat=Authentication	evtSubCat=Login Success	aptRelated=0	pAttackPhase=Asset and Data Discovery

<158>LEEF:1.0|Trend Micro|Deep Discovery Inspector|5.5.1200|SECURITY_RISK_DETECTION|devTimeFormat=MMM dd yyyy HH:mm:ss z	ptype=IDS	dvc=111.111.111.111	deviceMacAddress=11:11:11:11:11:11	dvchost=NAME1	deviceGUID=1A1A1A1A1A1A-1A1A1A1A-1A1A-1A1A-1A1A	devTime=Feb 17 2020 12:02:07 GMT+01:00	sev=2	protoGroup=CIFS	proto=SMB2	vLANId=1115	deviceDirection=0	dhost=name.domain.de	dst=111.111.111.111	dstPort=145	dstMAC=11:11:11:11:11:11	shost=111.111.111.111	src=111.111.111.111	srcPort=1113	srcMAC=11:11:11:11:11:11	malType=OTHERS	filePath=H1111\Outlook\	fname=~archive.pst.tmp	fileType=327680	fsize=4096	ruleId=4222	msg=PST File Upload	deviceRiskConfidenceLevel=3	pComp=NCIE	riskType=1	srcZone=0	dstGroup=Name Network	dstZone=1	detectionType=1	act=not blocked	threatType=2	interestedIp=111.111.111.111	peerIp=111.111.111.111	fileHash=9296DE94851FF12571C42E80D78D169AAC683CD3	sUser1=domain\h1111	cnt=3	aggregatedCnt=1	evtCat=File	aptRelated=0	pAttackPhase=Data Exfiltration
	
<158>LEEF:1.0|Trend Micro|Deep Discovery Inspector|5.5.1200|SECURITY_RISK_DETECTION|devTimeFormat=MMM dd yyyy HH:mm:ss z	ptype=IDS	dvc=111.111.111.111	deviceMacAddress=11:11:11:11:11:11	dvchost=NAME1	deviceGUID=1A1A1A1A1A1A-1A1A1A1A-1A1A-1A1A-1A1A	devTime=Feb 17 2020 12:19:02 GMT+01:00	sev=6	protoGroup=SMTP	proto=SMTP	vLANId=4095	deviceDirection=1	dhost=111.111.111.111	dst=111.111.111.111	dstPort=111	dstMAC=11:11:11:11:11:11	shost=name.domain.intern	src=111.111.111.111	srcPort=1117	srcMAC=11:11:11:11:11:11	malType=OTHERS	fileType=393871360	fsize=1029	ruleId=29	msg=Unregistered sender and recipient domains - Email	deviceRiskConfidenceLevel=3	duser=name@domain.de	suser=name@domain.de	mailMsgSubject=Reigabe Vong 180009: Prg und Gigung move eltor G ( 0102 )	pComp=CAV	riskType=1	srcGroup=Name Network	srcZone=1	dstZone=0	detectionType=1	act=not blocked	threatType=2	interestedIp=111.111.111.111	peerIp=111.111.111.111	fileHash=5A289B28850A26C12882EC11AD45E5D78C16775D	cnt=1	aggregatedCnt=1	evtCat=Suspicious Traffic	evtSubCat=Email	aptRelated=0

Update:
The failure for this index disappeared. I checked today and all fields are parsed correctly.

But I got the same strange error after editing another index. For me it seems that the error occures than a (broken) field is created I didn't defined.

Here is an example which is related to the config and log I already posted:

Logstash is creating the message field and all fields located in it correctly, e.g.:

 message = <158>LEEF:1.0|Trend Micro|Deep Discovery Inspector|5.5.1200|SECURITY_RISK_DETECTION|devTimeFormat=MMM dd yyyy HH:mm:ss z	ptype=IDS	dvc=111.111.111.111	deviceMacAddress=11:11:11:11:11:11	dvchost=NAME1	deviceGUID=1A1A1A1A1A1A-1A1A1A1A-1A1A-1A1A-1A1A	devTime=Feb 17 2020 12:21:41 GMT+01:00	sev=2	protoGroup=SQL	proto=MSSQL	vLANId=1111	deviceDirection=1	dhost=name-name-1111	dst=111.111.111.111	dstPort=1114	dstMAC=11:11:11:11:11:11	shost=name.domain.de	src=111.111.111.111	srcPort=1113	srcMAC=11:11:11:11:11:11	malType=OTHERS	fileType=-65536	fsize=0	ruleId=504	msg=Successful log on to service	deviceRiskConfidenceLevel=3	pComp=NCIE	riskType=1	srcGroup=Name Network	srcZone=1	dstZone=0	detectionType=1	act=not blocked	threatType=2	interestedIp=111.111.111.111	peerIp=111.111.111.111	dUser1=domain\h1111	cnt=1	aggregatedCnt=1	evtCat=Authentication	evtSubCat=Login Success	aptRelated=0	pAttackPhase=Asset and Data Discovery
 pComp = Sandbox
 tags = Deep Discovery Inspector

But the additional field <158>LEEF is created which I didn't defined:

 <158>LEEF = 1.0|Trend Micro|Deep Discovery Inspector|5.5.1200|SECURITY_RISK_DETECTION|devTimeFormat=MMM dd yyyy HH:mm:ss z	ptype=IDS	dvc=111.111.111.111	deviceMacAddress=11:11:11:11:11:11	dvchost=NAME1	deviceGUID=1A1A1A1A1A1A-1A1A1A1A-1A1A-1A1A-1A1A	devTime=Feb 17 2020 12:21:41 GMT+01:00	sev=2	protoGroup=SQL	proto=MSSQL	vLANId=1111	deviceDirection=1	dhost=name-name-1111	dst=111.111.111.111	dstPort=1114	dstMAC=11:11:11:11:11:11	shost=name.domain.de	src=111.111.111.111	srcPort=1113	srcMAC=11:11:11:11:11:11	malType=OTHERS	fileType=-65536	fsize=0	ruleId=504	msg=Successful log on to service	deviceRiskConfidenceLevel=3	pComp=NCIE	riskType=1	srcGroup=Name Network	srcZone=1	dstZone=0	detectionType=1	act=not blocked	threatType=2	interestedIp=111.111.111.111	peerIp=111.111.111.111	dUser1=domain\h1111	cnt=1	aggregatedCnt=1	evtCat=Authentication	evtSubCat=Login Success	aptRelated=0	pAttackPhase=Asset and Data Discovery

After I edit an index and restart logstash to activate my changes the field <158>LEEF breaks so that additional fields are created which contain the remaining parts of the event.
Here is an example:

 <158>LEEF = 1.0|Trend Micro|Deep Discovery Inspector|5.5.
 1200|SECURITY_RISK_DETECTION|devTimeFormat=MMM  = dd yyyy HH:mm:ss z	ptype=IDS dvc=111.111.111.111	deviceMacAddress=11:11:11:11:11:11	dvchost=NAME1	deviceGUID=1A1A1A1A1A1A-1A1A1A1A-1A1A-1A1A-1A1A	devTime=Feb 17 2020 12:21:41 GMT+01:00	sev=2	protoGroup=SQL	proto=MSSQL	vLANId=1111	deviceDirection=1	dhost=name-name-1111	dst=111.111.111.111	dstPort=1114	dstMAC=11:11:11:11:11:11	shost=name.domain.de	src=111.111.111.111	srcPort=1113	srcMAC=11:11:11:11:11:11	malType=OTHERS	fileType=-65536	fsize=0	ruleId=504	msg=Successful log on to service	deviceRiskConfidenceLevel=3	pComp=N

Do you have an idea why the field <158>LEEF is created and what causes the split of it?