I have been searching the internet for a few hours now trying to solve this question. I am currently parsing out data from Cuckoo Sandbox (automated malware analysis) which comes as a key-value pair. The issue I am running into is some of the values are semicolon seperated and I would like to add them to my db as an array.
I have attempted using the KV filter to split but I am either not using it as intended or am going the wrong direction.
The message comes across as:
Timestamp="2015/08/01 13:52:56" id="24814" Submission="file" MD5="d18d493b20d68a37cc5bbf0dbeb72f46" SHA1="bf5ac56e8b9884c825a95499ad9f2a63f733054e" File_Name="d18d493b20d68a37cc5bbf0dbeb72f46" File_Size="45782" File_Type="HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators" MalScore="0.5" Related_IPs="-" Related_Domains="static.4shared.com;c.statcounter.com;secure.quantserve.com;www.statcounter.com" Total_TCP="0" Total_UDP="68"Virustotal="Not Found" Cuckoo_Sigs="injection_rwx" Yara="-"
I am currently parsing the data to the following format... I am very new to Grok filtering, so if you have any recommendations outside of my request, please feel free to provide input:
"year": "2015",
"month": "08",
"day": "01",
"time": "13:52:56",
"id": "24814",
"submission": "file",
"md5": "d18d493b20d68a37cc5bbf0dbeb72f46",
"sha1": "bf5ac56e8b9884c825a95499ad9f2a63f733054e",
"filename": "d18d493b20d68a37cc5bbf0dbeb72f46",
"filesize": "45782",
"filetype": "HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators",
"malscore": "0.5",
"relatedips": "-",
"relateddomains": "static.4shared.com;c.statcounter.com;secure.quantserve.com;www.statcounter.com",
"totaltcp": "0",
"totaludp": "68",
"virustotal": "Not Found",
"cuckoosigs": "injection_rwx",
"yara": "-""
I am attempting to split 3 fields... relatedips, relateddomains on ";" and then filetype on ",".
"relateddomains": "static.4shared.com;c.statcounter.com;secure.quantserve.com;www.statcounter.com"
Any help would truly be appreciated.
Jim