Please help with Grok pattern splitting out a filename

ZillaG · April 17, 2017, 3:20pm

One of these days I'll learn regex.

I have the following filename

PE-run1000hbgmm3f1-job1000hbgmm3dt-Output-Workflow-1000hbgmm3fb-22.07.17.log

I'm able to get this to work so...

(?<logtype>[^-]+)-(?<run_id>[^-]+)-(?<job_id>[^-]+)-(?<capability>[^(0-9\.0-9\.0-9)]+)

logtype: PE
run_id: run1000hbgmm3f1
job_id: job1000hbgmm3dt

But I'm getting
capability: Output-Workflow-

...though I want it to be
capability: Output-Workflow-1000hbgmm3fb

...that is, all the text after the job_id up to the timestamp HH.mm.ss. The "capability" shown in this filename shows 3 parts delineated by dashes, but sometimes it only has 2 parts, sometimes 4 parts. So "capabiity" is anything after the job_id field up to the HH.mm.ss timestamp. Any help please? Thanks!

birkoff · April 17, 2017, 4:50pm

(?<logtype>[^-]+)-(?<run_id>[^-]+)-(?<job_id>[^-]+)-(?<capability>.+)-

ZillaG · April 17, 2017, 5:01pm

@birkoff, thanks, that works. Please explain how come the pattern for capability didn't just catch "Output", that is, the word until the first dash -?

magnusbaeck · April 18, 2017, 7:34am

Please explain how come the pattern for capability didn't just catch "Output", that is, the word until the first dash -?

Because with [^(0-9\.0-9\.0-9)]+ you're capturing one or more characters of any kind as long as they're not numbers or periods. With [...] you're defining a set of characters. It's not a subexpression. That part of your expression is equivalent to [^0-9\.]+.

system · May 16, 2017, 7:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.