Using regex in gsub raises CPU usage significantly

Ged · October 5, 2019, 2:25am

Hi guys,

I want to share my observations about using gsub with regex.

I use gsub for extracting application instance name from log file path.
Example path: /logs/pppas/logs_tcc/app619_pppas-prod1/payload.log where
app619_pppas-prod1 is instance name.

Originally i used static values in gsub:

mutate {
        gsub => [path, "/logs/pppas/logs_tcc/", ""]
        gsub => [path, "/payload.log", ""]
      }

and that works fine.
But I've changed to below using regex:

mutate {
        gsub => [path, ".*\/([^\/]+)+\/+[^\/]+", "\1"]
}

and then my CPU usage doubled from 50% to 100% on 2x6 cores machine !

So be careful with using that.

Could anyone look at that and explain the reason ?

Thanks
Ged

Christian_Dahlqvist · October 5, 2019, 5:02am

I would say that is expected as regex processing depending on pattern can be a lot more CPU intensive than matching static strings.

Ged · October 5, 2019, 11:33am

Correct, and when i have huge amount of log lines for processing impact on CPU usage is significant. Just posted this to Logstash users to be aware of that.

Thanks !

Ged

Badger · October 5, 2019, 1:09pm

That could backtrack a lot. It might be faster if you anchor it to the end of line using $.

mutate { gsub => [path, ".*\/([^\/]+)+\/+[^\/]+$", "\1"] }

Ged · October 19, 2019, 12:12pm

Thanks a lot !
I'll try it and let know of results.
Hoping Logstash parses regex only on configuration load and keeps it in memory not at every event.

Ged

Badger · October 19, 2019, 12:55pm

The regular expresions are compiled during initialization and re-used.

system · November 16, 2019, 12:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.