Using Grok: How To Parse Multiple Entries With The Same Field Name?

I've been searching for a pre-made Grok pattern for Apache's mod_security error log, and couldn't find any. An example of a log entry:

Mon Jun 03 15:07:12.453090 2019] [:error] [pid 15595] [client 192.168.0.254:57318] [client 192.168.0.254] ModSecurity: Warning. Matched phrase "bin/bash" at ARGS:exec. [file "/etc/httpd/modsecurity.d/activated_rules/REQUEST-932-APPLICATION-ATTACK-RCE.conf"] [line "500"] [id "932160"] [msg "Remote Command Execution: Unix Shell Code Found"] [data "Matched Data: bin/bash found within ARGS:exec: /bin/bash"] [severity "CRITICAL"] [ver "OWASP_CRS/3.1.0"] [tag "application-multi"] [tag "language-shell"] [tag "platform-unix"] [tag "attack-rce"] [tag "OWASP_CRS/WEB_ATTACK/COMMAND_INJECTION"] [tag "WASCTC/WASC-31"] [tag "OWASP_TOP_10/A1"] [tag "PCI/6.5.2"] [hostname "test.us.mydomain.com"] [uri "/images/random/random-logo.png"] [unique_id "XPWaEGNi1xVn2c58vCAiEwAAAAM"]

So here's my attempt to come up with the rule:

\[%{HTTPDERROR_DATE:timestamp}\] \[(%{WORD:module})?:%{LOGLEVEL:loglevel}\] \[pid %{POSINT:pid}(:tid %{NUMBER:tid})?\] \[client %{IPORHOST:clientip}:%{POSINT:clientport}\] \[client %{IPORHOST:cip2}\] %{WORD:errorsource}: %{DATA:errormsg} \[file \"%{DATA:rulefilename}\"\] \[line \"%{POSINT:rulelinenum}\"\] \[id \"%{POSINT:ruleid}\"\] \[msg \"%{DATA:rulemsg}\"\] \[data \"%{DATA:ruledata}\"\] \[severity \"%{WORD:ruleseverity}\"\] \[ver \"%{DATA:ruleversion}\"\] \[tag \"%{DATA:ruletag1}\"\] \[tag \"%{DATA:ruletag2}\"\] \[tag \"%{DATA:ruletag3}\"\] \[tag \"%{DATA:ruletag4}\"\] \[tag \"%{DATA:ruletag5}\"\] \[tag \"%{DATA:ruletag6}\"\] \[tag \"%{DATA:ruletag7}\"\] \[tag \"%{DATA:ruletag8}\"\] \[hostname \"%{HOSTNAME:hostname}\"\] \[uri \"%{URIPATHPARAM:uri}\"\] \[unique_id \"%{WORD:uniqueid}\"\]

Sort of brute force approach. However, mod_security actually has further logs (of the same error) into two other different lines:

[Mon Jun 03 15:07:12.454321 2019] [:error] [pid 15595] [client 192.168.0.254:57318] [client 192.168.0.254] ModSecurity: Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score. [file "/etc/httpd/modsecurity.d/activated_rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "91"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total Score: 5)"] [severity "CRITICAL"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "test.us.mydomain.com"] [uri "/images/random/random-logo.png"] [unique_id "XPWaEGNi1xVn2c58vCAiEwAAAAM"]

[Mon Jun 03 15:07:12.454684 2019] [:error] [pid 15595] [client 192.168.0.254:57318] [client 192.168.0.254] ModSecurity: Warning. Operator GE matched 5 at TX:inbound_anomaly_score. [file "/etc/httpd/modsecurity.d/activated_rules/RESPONSE-980-CORRELATION.conf"] [line "86"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 5 - SQLI=0,XSS=0,RFI=0,LFI=0,RCE=5,PHPI=0,HTTP=0,SESS=0): individual paranoia level scores: 5, 0, 0, 0"] [tag "event-correlation"] [hostname "test.us.mydomain.com"] [uri "/images/random/random-logo.png"] [unique_id "XPWaEGNi1xVn2c58vCAiEwAAAAM"]

Notice the only difference is the # of [tags] words in message. The first has 8, the 2nd has 4, the 3rd has only 1. Obviously my brute force approach won't match. Is there an elegant way to parse those multiple [tags] field names with grok? I wonder if some Ruby parsing magic needs to be applied here.

Thanks in advance.

Do not try to do it all with grok. I would break off the initial common section with dissect, then pull out the ModSecurity message using grok, then chop up the rest using a kv filter. Something like

    dissect { mapping => { "message" => "[%{ts}] [:%{level}] [pid %{pid}] [client %{clientA}] [client %{clientB}] %{[@metadata][restOfLine]}" } }
    grok { match => { "[@metadata][restOfLine]" => [ "ModSecurity: (?<theMessage>[^\[]+ )(?<[@metadata][theRest]>\[.*)" ] } }
    kv { source => "[@metadata][theRest]" field_split => "\]\[" value_split => " " }

grok is one of the most powerful (and popular) filters for parsing events. That's exactly why you should at least consider the rest of the filters to see if something more specific (and therefore cheaper) can do the job.

If you need tag to be an array with a single member when it is a string then I would use a ruby filter for that.

Awesome! Thanks for pointing me in the right direction!