GROK pattern issue

Hello, I am using filebeat to send logs from a DNS server to logstash before being ingested in Elasticsearch.

The same pattern is repeated with two spaces and one space due to the variation in logs over updates which is generally reverted.

None of the logs have _grokpatternfailure

However, I see that the actions field is capturing the term gravity instead of the actual action of either cached, blocked, query, reply, others.

The term GRAVITY is taken in case the logs have blocked action; I am unsure why. Can someone help me with the GROK pattern changes below?

Few logs of blocked entries

Jan 17 19:35:57 dnsmasq[879]: gravity blocked browser.pipe.aria.microsoft.com is 0.0.0.0
Jan 17 19:35:57 dnsmasq[951]: gravity blocked browser.pipe.aria.microsoft.com is NODATA
Jan 17 19:35:58 dnsmasq[951]: gravity blocked app-measurement.com is 0.0.0.0

Following is the GROK pattern:

filter {
   grok {
   patterns_dir => ["/etc/logstash/patterns/"]
   match=> ["message", "%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: gravity %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:action}\[DNSKEY] %{DATA:query_hostname} to 127.0.0.1",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} result is %{WORD:result}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV6:resolved_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} to 127.0.0.1",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{GREEDYDATA:result}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:resolved_ip} is %{GREEDYDATA:result}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:resolved_ip} is %{GREEDYDATA:query_hostname}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{DATA:query_hostname} from %{IPV4:src_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{GREEDYDATA:result}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{IPV4:query_arpa}.in-addr.arpa from %{IPV4:src_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:query_arpa}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} from %{IPV4:src_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{DATA:query_hostname} from %{IPV4:src_ip}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} ",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:query_type}\[DS] %{DATA:query_hostname} to 127.0.0.1",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:query} is %{GREEDYDATA:result}",
   "message","%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:list} %{DATA:rDNS} is %{GREEDYDATA:local_hostname}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{IPV4:query_ptr}.in-addr.arpa from %{IPV4:src_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: gravity %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:action}\[DNSKEY] %{DATA:query_hostname} to 127.0.0.1",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} result is %{WORD:result}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV6:resolved_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} to 127.0.0.1",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{GREEDYDATA:result}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:resolved_ip} is %{GREEDYDATA:result}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:resolved_ip} is %{GREEDYDATA:query_hostname}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{DATA:query_hostname} from %{IPV4:src_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{GREEDYDATA:result}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{IPV4:query_arpa}.in-addr.arpa from %{IPV4:src_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:query_arpa}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} from %{IPV4:src_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{DATA:query_hostname} from %{IPV4:src_ip}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} ",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:query_type}\[DS] %{DATA:query_hostname} to 127.0.0.1",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:query} is %{GREEDYDATA:result}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:list} %{DATA:rDNS} is %{GREEDYDATA:local_hostname}",
   "message","%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action}\[%{WORD:query_type}] %{IPV4:query_ptr}.in-addr.arpa from %{IPV4:src_ip}"]
   }

Please share the patterns you have in this path so people can try to replicate your issue.

Also, share the output you are getting and the expected output.

Hello I sincerely aplogise that I forgot to add the patterns

logdate [\w]{3}\s[\s\d]{2}\s\d\d\:\d\d\:\d\d
blocklist [\/\w\.]+
domain [\w\.\-]+
clientip \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
ip \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

Also, the expected output is to take action blocked and index it to the field action.

Here the value gravity should be blocked.

Here are a few logs from other fields
action == reply
Jan 17 19:56:57 dnsmasq[879]: reply d10zdin3c2kqf3.cloudfront.net is 18.66.53.110

action == query
Jan 17 19:56:57 dnsmasq[879]: query[A] d10zdin3c2kqf3.cloudfront.net from 10.0.0.10

action == cached
Jan 17 19:56:57 dnsmasq[951]: cached web.facebook.com is <CNAME>

First, the match option to a grok filter is a hash. logstash will convert an array to a hash pairwise, but I find it clearer to use a hash in the configuration file.

Second, I would recommend matching [message] against an array of patterns rather than using multiple entries in the match hash. So I would use

match=> {
    "message" => [
        "%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
        "%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: gravity %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
        "%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{DATA:action}\[DNSKEY] %{DATA:query_hostname} to 127.0.0.1",
        ....
    ]
}

Third, anchor your patterns. grok is expensive when it fails to match, not when it matches. If your lines start with the timestamp then add ^ at the start of each pattern.

match=> {
    "message" => [
        "^%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
        ....

Fourth, you are matching the first part of the line over and over and over again, which is expensive. It will be cheaper to split this into two groks

grok { match=> { "message" => [ "^%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{GREEDYDATA:[@metadata][restOfLine]}" } }
grok {
    match=> {
        "[@metadata][restOfLine]" => [
            "^%{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
            "^gravity %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
            "^%{DATA:action}\[DNSKEY] %{DATA:query_hostname} to 127.0.0.1",
            ....
        ]
}

Fifth, instead of repeating the patterns with either one or two spaces, use a regexp that matches both

"%{MONTH:date}\s{1,2}%{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: ...

Sixth, each of the patterns in the array is matched in order. The first one wins

"%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",
"%{MONTH:date}  %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: gravity %{WORD:action} %{DATA:query_hostname} is %{IPV4:resolved_ip}",

%{WORD:action} will match gravity, and then %{DATA:query_hostname} will match everything up to is. The second, more specific pattern, should come before the first.

Seventh, there is no way to distinguish between these two patterns. The first will always match anything that would match the second

"%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:resolved_ip} is %{GREEDYDATA:result}",
"%{MONTH:date} %{MONTHDAY:date} %{TIME:date} %{WORD:service}\[%{NUMBER:pid}\]: %{WORD:action} %{IPV4:resolved_ip} is %{GREEDYDATA:query_hostname}",

Eighth, you may be able to disambiguate some of your patterns by replacing DATA or GREEDYDATA with more specific patterns like NOTSPACE or HOSTNAME.

2 Likes

Thank you very much. I will work on your recommendations (although I expect them to take time given the depth and quality of your recommendation [touching CPU cycles :D]).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.