Logstash CPU gets pegged, all processing stops with Apache logs

We are doing Apache accesslog ingestion into Logstash, from publicly-facing web servers. These work correctly, however after a day or two, some "crafty" requests come in, either from a security scan or hack attempt, and logstash CPU will go to about 400%, and nothing will be sent to Elastic until I "kill -9" the logstash process and restart it. Even doing a normal "systemctl restart logstash" won't work.

Is there a way to detect and sanitize such inputs before the logstash process becomes unresponsive?

Here are some examples of logs that come in during this time, however I'm not certain that these are the ones causing the issue.

[2021-12-27T07:10:54,461][WARN ][logstash.filters.grok    ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (1257 bytes)! First 255 chars are: ????BBITp??Z?T????[????#+f#???????z???+K6?i!?????????h?-*?n??&?KADS??????k?*VW????0??3?L|%?49??>??#n?]?\???n?"s.???k? ?C?lgY?M??y???n?V???]CN?Ú???w??T??`?>k?C?xS????y????O{xr;???7??:???!??8?,ujn?Sm???9?x????,?z??lxãj??^?]??F??['!
[2021-12-27T07:10:55,158][WARN ][logstash.filters.grok    ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value '????6????_9  ????N??x??tEl?????%|I?8?g??~??[S???zE???????????nm       ???3?????j???W^?0???????0"?3@????.?e;?b??)?;?p'!??Bp?XoRN???xo9??G????&+?B????9?Om??i[?u??Ovbde.?-%??k??Z????P???V?N????Y???
[2021-12-27T07:11:25,160][WARN ][logstash.filters.grok    ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (672 bytes)! First 255 chars are: ??8?2??G???rZ?~?%????<]???Z?B??@E???5d??Y????%?5??Kz
       5D6??y9O??N+t???
??Aoi   r??            ??0?g_?!Lzµ??\??
??r?I???S?#=??|x????|r??)[??????Yw&h+U  T?%?E?v`????;<??fu?C9??M??????q????????j?x$-??H??f?Gr???rMq?????
                                                                                                                .????Y??*X$?dm\'!

If the grok is timing out, then those messages are probably the cause of the high CPU.

The default timeout is 30s (30000 milliseconds), you can try to reduce it with the option timeout_millis.

Also, are you anchoring your grok patterns? Using ^ or $ ? This can makes grok faster and fails earlier, avoiding the timeouts. You can read more about it on this blog post.

If you do not use anchors, the grok filte will try to match pattern against other parts of the message, and this can lead to timeouts, which will increase the CPU usage.

Can you share your grok patterns?

1 Like

Thank you for the quick response. Here is what we are using for patterns:

NOTSPACE [^ ]+
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} "%{NOTQUOTE:referrer}" "%{NOTQUOTE:useragent_string}"

NOTQUOTE [^"]+
CUSTOM_REQUEST_TIME \d+|%{NOTSPACE}
CUSTOM_PORT \d+|%{NOTSPACE}
CUSTOM_OUTLOOKSESSION %{NOTSPACE}

CUSTOM_HTTPD_COMBINEDLOG1 %{HTTPD_COMBINEDLOG} "%{NOTQUOTE}" %{CUSTOM_REQUEST_TIME:req_time} %{CUSTOM_PORT:port} %{IP:local_ip}
CUSTOM_HTTPD_COMBINEDLOG2 %{HTTPD_COMBINEDLOG} "%{CUSTOM_OUTLOOKSESSION}" %{CUSTOM_REQUEST_TIME:req_time} %{CUSTOM_PORT:port} %{IP:local_ip}
CUSTOM_HTTPD_COMBINEDLOG %{CUSTOM_HTTPD_COMBINEDLOG1}|%{CUSTOM_HTTPD_COMBINEDLOG2}}

You need to share your grok filters also.

Sorry, here they are:

filter {
  if [fields][logtype] == "apache_access_log" {

    if ![type] {
      mutate {
       add_field => { "type" => "apache_access_log" }
      }
    }

    grok {
      patterns_dir => "/etc/logstash/patterns/"
      add_field => [ "received_at", "%{@timestamp}" ]
      match => { "message" => "%{CUSTOM_HTTPD_COMBINEDLOG}" }
    }

    useragent {

      source => "useragent_string"
      prefix=> "ua_"

    }
    geoip {
      source => "clientip"
    }
    mutate {
      split => ["[host][hostname]", "."]
      add_field => { "shorthostname" => "%{[host][hostname][0]}" }
    }

    dns {
      reverse => [ "dnsname" ]
      action => "replace"
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS Z", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
  }
}

For whatever it's worth, here are some additional logs that just came in, that were potentially causing hangs last week. I've set the timeout to 2 seconds and logstash still appears to be working.

[2021-12-27T10:51:41,229][WARN ][logstash.filters.grok    ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (606 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.757 -0600] "GET /wwwboard.cgi HTTP/1.1" 403 1584 "-" "() { ignored; }; echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6271_rce Output : $((71+75))\"" "sticky_server=sticky.89b2d2498a1bf6ecf6b5f0d0eba3cf'!
[2021-12-27T10:51:43,265][WARN ][logstash.filters.grok    ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (620 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.764 -0600] "GET /ucsm/isSamInstalled.cgi HTTP/1.1" 302 225 "-" "() { _; } >_[$($())] { echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6278 Output : $((20+77))\"; }" "sticky_server=sticky.89b2d2498a1bf6e'!
[2021-12-27T10:51:45,274][WARN ][logstash.filters.grok    ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (606 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.826 -0600] "GET /xampp/cgi.cgi HTTP/1.1" 302 225 "-" "() { ignored; }; echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6271_rce Output : $((71+75))\"" "sticky_server=sticky.89b2d2498a1bf6ecf6b5f0d0eba3cf'!

HTTPD_COMMONLOG is not anchored in the default httpd patterns (which feels like a bug to me), so you can probably eliminate all your timeouts by adding ^ on one line

CUSTOM_HTTPD_COMBINEDLOG ^%{CUSTOM_HTTPD_COMBINEDLOG1}|%{CUSTOM_HTTPD_COMBINEDLOG2}}

Interesting. Even with adding the carat I'm still seeing timeouts, but what you say makes sense.

Here's an example of one particular raw message that times out. It makes me wonder if logstash is having some issue with all the escapes nested inside of the parens and braces. According to grokconstructor, it matches properly with the above rules I posted.

166.205.181.46 - - [27/Dec/2021:11:50:56.907 -0600] "GET /contentImages/dontmissout.jpg HTTP/1.1" 200 52293 "https://myactivity.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A102U1 Build/RP1A.200720.012; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/96.0.4664.104 Mobile Safari/537.36 OcIdWebView ({\"os\":\"Android\",\"osVersion\":\"30\",\"app\":\"com.google.android.gms\",\"appVersion\":\"219\",\"style\":2,\"isDarkTheme\":false})" "-" 3939 443 172.24.2.139

I think I figured it out. There are fields that I overrode with "%{NOTSPACE}" that should in fact just be %QS, because that takes into account the weird escaped characters.

On a sidenote, I would suggest that the default pattern for HTTPD_COMMONLOG in logstash get modified so that the first field can be a dash character (-). We have log lines that won't match otherwise.

DEFAULT:
HTTPD_COMMONLOG %{IPORHOST:[source][address]} (?:-|%{HTTPDUSER:[apache][access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int})

SUGGESTED:
HTTPD_COMMONLOG (?:-|%{IPORHOST:[source][address]}) (?:-|%{HTTPDUSER:[apache][access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int})

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.