Using regex in gsub raises CPU usage significantly

Hi guys,

I want to share my observations about using gsub with regex.

I use gsub for extracting application instance name from log file path.
Example path: /logs/pppas/logs_tcc/app619_pppas-prod1/payload.log where
app619_pppas-prod1 is instance name.

Originally i used static values in gsub:

mutate {
        gsub => [path, "/logs/pppas/logs_tcc/", ""]
        gsub => [path, "/payload.log", ""]
      }

and that works fine.
But I've changed to below using regex:

mutate {
        gsub => [path, ".*\/([^\/]+)+\/+[^\/]+", "\1"]
}

and then my CPU usage doubled from 50% to 100% on 2x6 cores machine !

So be careful with using that.

Could anyone look at that and explain the reason ?

Thanks
Ged

I would say that is expected as regex processing depending on pattern can be a lot more CPU intensive than matching static strings.

Correct, and when i have huge amount of log lines for processing impact on CPU usage is significant. Just posted this to Logstash users to be aware of that.

Thanks !

Ged

That could backtrack a lot. It might be faster if you anchor it to the end of line using $.

mutate { gsub => [path, ".*\/([^\/]+)+\/+[^\/]+$", "\1"] }

Thanks a lot !
I'll try it and let know of results.
Hoping Logstash parses regex only on configuration load and keeps it in memory not at every event.

Ged

The regular expresions are compiled during initialization and re-used.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.