We are doing Apache accesslog ingestion into Logstash, from publicly-facing web servers. These work correctly, however after a day or two, some "crafty" requests come in, either from a security scan or hack attempt, and logstash CPU will go to about 400%, and nothing will be sent to Elastic until I "kill -9" the logstash process and restart it. Even doing a normal "systemctl restart logstash" won't work.
Is there a way to detect and sanitize such inputs before the logstash process becomes unresponsive?
Here are some examples of logs that come in during this time, however I'm not certain that these are the ones causing the issue.
[2021-12-27T07:10:54,461][WARN ][logstash.filters.grok ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (1257 bytes)! First 255 chars are: ????BBITp??Z?T????[????#+f#???????z???+K6?i!?????????h?-*?n??&?KADS??????k?*VW????0??3?L|%?49??>??#n?]?\???n?"s.???k? ?C?lgY?M??y???n?V???]CN?Ú???w??T??`?>k?C?xS????y????O{xr;???7??:???!??8?,ujn?Sm???9?x????,?z??lxãj??^?]??F??['!
[2021-12-27T07:10:55,158][WARN ][logstash.filters.grok ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value '????6????_9 ????N??x??tEl?????%|I?8?g??~??[S???zE???????????nm ???3?????j???W^?0???????0"?3@????.?e;?b??)?;?p'!??Bp?XoRN???xo9??G????&+?B????9?Om??i[?u??Ovbde.?-%??k??Z????P???V?N????Y???
[2021-12-27T07:11:25,160][WARN ][logstash.filters.grok ][main][eb3b98f9bd73604fe58153001fdc168fbcde6bed786c335f95262cc4d9211e58] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (672 bytes)! First 255 chars are: ??8?2??G???rZ?~?%????<]???Z?B??@E???5d??Y????%?5??Kz
5D6??y9O??N+t???
??Aoi r?? ??0?g_?!Lzµ??\??
??r?I???S?#=??|x????|r??)[??????Yw&h+U T?%?E?v`????;<??fu?C9??M??????q????????j?x$-??H??f?Gr???rMq?????
.????Y??*X$?dm\'!
If the grok is timing out, then those messages are probably the cause of the high CPU.
The default timeout is 30s (30000 milliseconds), you can try to reduce it with the option timeout_millis.
Also, are you anchoring your grok patterns? Using ^ or $ ? This can makes grok faster and fails earlier, avoiding the timeouts. You can read more about it on this blog post.
If you do not use anchors, the grok filte will try to match pattern against other parts of the message, and this can lead to timeouts, which will increase the CPU usage.
For whatever it's worth, here are some additional logs that just came in, that were potentially causing hangs last week. I've set the timeout to 2 seconds and logstash still appears to be working.
[2021-12-27T10:51:41,229][WARN ][logstash.filters.grok ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (606 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.757 -0600] "GET /wwwboard.cgi HTTP/1.1" 403 1584 "-" "() { ignored; }; echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6271_rce Output : $((71+75))\"" "sticky_server=sticky.89b2d2498a1bf6ecf6b5f0d0eba3cf'!
[2021-12-27T10:51:43,265][WARN ][logstash.filters.grok ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (620 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.764 -0600] "GET /ucsm/isSamInstalled.cgi HTTP/1.1" 302 225 "-" "() { _; } >_[$($())] { echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6278 Output : $((20+77))\"; }" "sticky_server=sticky.89b2d2498a1bf6e'!
[2021-12-27T10:51:45,274][WARN ][logstash.filters.grok ][main][9427e4a996e91bcb268133b8a5d92e7631d1d6c130fb109f0a8d84fa7c762532] Timeout executing grok '%{CUSTOM_HTTPD_COMBINEDLOG}' against field 'message' with value 'Value too large to output (606 bytes)! First 255 chars are: 44.242.181.181 - - [27/Dec/2021:10:49:50.826 -0600] "GET /xampp/cgi.cgi HTTP/1.1" 302 225 "-" "() { ignored; }; echo Content-Type: text/plain ; echo ; echo \"bash_cve_2014_6271_rce Output : $((71+75))\"" "sticky_server=sticky.89b2d2498a1bf6ecf6b5f0d0eba3cf'!
HTTPD_COMMONLOG is not anchored in the default httpd patterns (which feels like a bug to me), so you can probably eliminate all your timeouts by adding ^ on one line
Here's an example of one particular raw message that times out. It makes me wonder if logstash is having some issue with all the escapes nested inside of the parens and braces. According to grokconstructor, it matches properly with the above rules I posted.
I think I figured it out. There are fields that I overrode with "%{NOTSPACE}" that should in fact just be %QS, because that takes into account the weird escaped characters.
On a sidenote, I would suggest that the default pattern for HTTPD_COMMONLOG in logstash get modified so that the first field can be a dash character (-). We have log lines that won't match otherwise.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.