Please help with Grok pattern splitting out a filename


(ZillaG) #1

One of these days I'll learn regex.

I have the following filename

PE-run1000hbgmm3f1-job1000hbgmm3dt-Output-Workflow-1000hbgmm3fb-22.07.17.log

I'm able to get this to work so...

(?<logtype>[^-]+)-(?<run_id>[^-]+)-(?<job_id>[^-]+)-(?<capability>[^(0-9\.0-9\.0-9)]+)

logtype: PE
run_id: run1000hbgmm3f1
job_id: job1000hbgmm3dt

But I'm getting
capability: Output-Workflow-

...though I want it to be
capability: Output-Workflow-1000hbgmm3fb

...that is, all the text after the job_id up to the timestamp HH.mm.ss. The "capability" shown in this filename shows 3 parts delineated by dashes, but sometimes it only has 2 parts, sometimes 4 parts. So "capabiity" is anything after the job_id field up to the HH.mm.ss timestamp. Any help please? Thanks!


(birkoff) #2

(?<logtype>[^-]+)-(?<run_id>[^-]+)-(?<job_id>[^-]+)-(?<capability>.+)-


(ZillaG) #3

@birkoff, thanks, that works. Please explain how come the pattern for capability didn't just catch "Output", that is, the word until the first dash -?


(Magnus B├Ąck) #4

Please explain how come the pattern for capability didn't just catch "Output", that is, the word until the first dash -?

Because with [^(0-9\.0-9\.0-9)]+ you're capturing one or more characters of any kind as long as they're not numbers or periods. With [...] you're defining a set of characters. It's not a subexpression. That part of your expression is equivalent to [^0-9\.]+.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.