Logstash CPU gets pegged, all processing stops with Apache logs

erihar · December 27, 2021, 2:32pm

We are doing Apache accesslog ingestion into Logstash, from publicly-facing web servers. These work correctly, however after a day or two, some "crafty" requests come in, either from a security scan or hack attempt, and logstash CPU will go to about 400%, and nothing will be sent to Elastic until I "kill -9" the logstash process and restart it. Even doing a normal "systemctl restart logstash" won't work.

Is there a way to detect and sanitize such inputs before the logstash process becomes unresponsive?

Here are some examples of logs that come in during this time, however I'm not certain that these are the ones causing the issue.

[2021-12-27T07:10:54,461][WARN ][logstash.filters.grok    ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (1257 bytes)! First 255 chars are: ????BBITp??Z?T????[????#+f#???????z???+K6?i!?????????h?-*?n??&?KADS??????k?*VW????0??3?L|%?49??>??#n?]?\???n?"s.???k? ?C?lgY?M??y???n?V???]CN?Ú???w??T??`?>k?C?xS????y????O{xr;???7??:???!??8?,ujn?Sm???9?x????,?z??lxãj??^?]??F??['!
[2021-12-27T07:10:55,158][WARN ][logstash.filters.grok    ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value '????6????_9  ????N??x??tEl?????%|I?8?g??~??[S???zE???????????nm       ???3?????j???W^?0???????0"?3@????.?e;?b??)?;?p'!??Bp?XoRN???xo9??G????&+?B????9?Om??i[?u??Ovbde.?-%??k??Z????P???V?N????Y???
[2021-12-27T07:11:25,160][WARN ][logstash.filters.grok    ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (672 bytes)! First 255 chars are: ??8?2??G???rZ?~?%????<]???Z?B??@E???5d??Y????%?5??Kz
       5D6??y9O??N+t???
??Aoi   r??            ??0?g_?!Lzµ??\??
??r?I???S?#=??|x????|r??)[??????Yw&h+U  T?%?E?v`????;<??fu?C9??M??????q????????j?x$-??H??f?Gr???rMq?????
                                                                                                                .????Y??*X$?dm\'!

leandrojmp · December 27, 2021, 3:07pm

If the grok is timing out, then those messages are probably the cause of the high CPU.

The default timeout is 30s (30000 milliseconds), you can try to reduce it with the option timeout_millis.

Also, are you anchoring your grok patterns? Using ^ or $ ? This can makes grok faster and fails earlier, avoiding the timeouts. You can read more about it on this blog post.

If you do not use anchors, the grok filte will try to match pattern against other parts of the message, and this can lead to timeouts, which will increase the CPU usage.

Can you share your grok patterns?

erihar · December 27, 2021, 3:16pm

Thank you for the quick response. Here is what we are using for patterns:

NOTSPACE [^ ]+
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} "%{NOTQUOTE:referrer}" "%{NOTQUOTE:useragent_string}"

NOTQUOTE [^"]+
CUSTOM_REQUEST_TIME \d+|%{NOTSPACE}
CUSTOM_PORT \d+|%{NOTSPACE}
CUSTOM_OUTLOOKSESSION %{NOTSPACE}

CUSTOM_HTTPD_COMBINEDLOG1 %{HTTPD_COMBINEDLOG} "%{NOTQUOTE}" %{CUSTOM_REQUEST_TIME:req_time} %{CUSTOM_PORT:port} %{IP:local_ip}
CUSTOM_HTTPD_COMBINEDLOG2 %{HTTPD_COMBINEDLOG} "%{CUSTOM_OUTLOOKSESSION}" %{CUSTOM_REQUEST_TIME:req_time} %{CUSTOM_PORT:port} %{IP:local_ip}
CUSTOM_HTTPD_COMBINEDLOG %{CUSTOM_HTTPD_COMBINEDLOG1}|%{CUSTOM_HTTPD_COMBINEDLOG2}}

leandrojmp · December 27, 2021, 3:41pm

You need to share your grok filters also.

erihar · December 27, 2021, 3:45pm

Sorry, here they are:

filter {
  if [fields][logtype] == "apache_access_log" {

    if ![type] {
      mutate {
       add_field => { "type" => "apache_access_log" }
      }
    }

    grok {
      patterns_dir => "/etc/logstash/patterns/"
      add_field => [ "received_at", "%{@timestamp}" ]
      match => { "message" => "%{CUSTOM_HTTPD_COMBINEDLOG}" }
    }

    useragent {

      source => "useragent_string"
      prefix=> "ua_"

    }
    geoip {
      source => "clientip"
    }
    mutate {
      split => ["[host][hostname]", "."]
      add_field => { "shorthostname" => "%{[host][hostname][0]}" }
    }

    dns {
      reverse => [ "dnsname" ]
      action => "replace"
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS Z", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
  }
}

erihar · December 27, 2021, 4:55pm

For whatever it's worth, here are some additional logs that just came in, that were potentially causing hangs last week. I've set the timeout to 2 seconds and logstash still appears to be working.

[2021-12-27T10:51:41,229][WARN ][logstash.filters.grok    ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (606 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.757 -0600] "GET /wwwboard.cgi HTTP/1.1" 403 1584 "-" "() { ignored; }; echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6271_rce Output : $((71+75))\"" "sticky_server=sticky.89b2d2498a1bf6ecf6b5f0d0eba3cf'!
[2021-12-27T10:51:43,265][WARN ][logstash.filters.grok    ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (620 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.764 -0600] "GET /ucsm/isSamInstalled.cgi HTTP/1.1" 302 225 "-" "() { _; } >_[$($())] { echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6278 Output : $((20+77))\"; }" "sticky_server=sticky.89b2d2498a1bf6e'!
[2021-12-27T10:51:45,274][WARN ][logstash.filters.grok    ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (606 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.826 -0600] "GET /xampp/cgi.cgi HTTP/1.1" 302 225 "-" "() { ignored; }; echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6271_rce Output : $((71+75))\"" "sticky_server=sticky.89b2d2498a1bf6ecf6b5f0d0eba3cf'!

Badger · December 27, 2021, 5:04pm

HTTPD_COMMONLOG is not anchored in the default httpd patterns (which feels like a bug to me), so you can probably eliminate all your timeouts by adding ^ on one line

CUSTOM_HTTPD_COMBINEDLOG ^%{CUSTOM_HTTPD_COMBINEDLOG1}|%{CUSTOM_HTTPD_COMBINEDLOG2}}

erihar · December 27, 2021, 5:34pm

Interesting. Even with adding the carat I'm still seeing timeouts, but what you say makes sense.

erihar · December 27, 2021, 6:05pm

Here's an example of one particular raw message that times out. It makes me wonder if logstash is having some issue with all the escapes nested inside of the parens and braces. According to grokconstructor, it matches properly with the above rules I posted.

166.205.181.46 - - [27/Dec/2021:11:50:56.907 -0600] "GET /contentImages/dontmissout.jpg HTTP/1.1" 200 52293 "https://myactivity.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A102U1 Build/RP1A.200720.012; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/96.0.4664.104 Mobile Safari/537.36 OcIdWebView ({\"os\":\"Android\",\"osVersion\":\"30\",\"app\":\"com.google.android.gms\",\"appVersion\":\"219\",\"style\":2,\"isDarkTheme\":false})" "-" 3939 443 172.24.2.139

erihar · December 27, 2021, 7:22pm

I think I figured it out. There are fields that I overrode with "%{NOTSPACE}" that should in fact just be %QS, because that takes into account the weird escaped characters.

erihar · December 27, 2021, 7:27pm

On a sidenote, I would suggest that the default pattern for HTTPD_COMMONLOG in logstash get modified so that the first field can be a dash character (-). We have log lines that won't match otherwise.

DEFAULT:
HTTPD_COMMONLOG %{IPORHOST:[source][address]} (?:-|%{HTTPDUSER:[apache][access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int})

SUGGESTED:
HTTPD_COMMONLOG (?:-|%{IPORHOST:[source][address]}) (?:-|%{HTTPDUSER:[apache][access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int})

system · January 24, 2022, 7:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash goes to 100% cpu and stops logging after a while Logstash	1	556	July 6, 2017
Logstash 5.4.0 stuck/hang after processing certain amount of data (http-output plugin) Logstash	1	685	August 4, 2017
Logstash stops processing after some time Logstash	4	5473	November 19, 2018
Logstash unresponsive after new grok pattern added Logstash	3	983	July 6, 2017
Logstash silently stops processing events and does not respond to SIGTERM Logstash	6	3841	July 6, 2017

Logstash CPU gets pegged, all processing stops with Apache logs

Related topics