[logstash]what does "\1" mean in gsub?

Hi team,

what does "\1" mean in below configuration and how to use it? I can;t find the official documentation which can explain it.

mutate {
  gsub => ["message","(?<=SERVICEPERFDATA::)procs=(\d+);\S+", "\1"]
}

Is below configuraiton valid
`

gsub => ["message", "(\S+=)", ", \1"]

`

Thanks,
Cherie

Hi!
Gsub is kinka like sed in bash.
In the second member "(?<=SERVICEPERFDATA::)procs=(\d+);\S+" (the one you want to replace) you define two blocks with parenthesis, and you can call them in the third member (the think you want at the end) with \1 ,\2 and so on.
Your last configuration is wrong cause each array of gsub need 3 elements, you shoud check the doc here.

Thank you abrx.

I have below messages, and I want to replace the space (the kv pair split pattern only, not include the spaces in the value like cn=Deutsche Telekom AG). If I want to replace the space with '?'.

Is this configuration correct?
`

gsub => ["message", "\S+(?\s)", "?"]

`

module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=GRIFOLS, S.A. cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EH] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche Telekom AG cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche Bank cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0

Well you should definitly build a test pipeline instead of guessing.

Here's some sample logstash configuration you could use, according that sample data is in /tmp/test.input and result will be written in /tmp/test.output with debug mode on.
(This works with mostly all recent versions as it's pretty simple).

input {
    file {
        path => "/tmp/test.input"
        start_position => "beginning"
    }
}
filter {
    mutate {
        gsub => ["message"," ","?"]
    }
}
output {
    file {
        path => "/tmp/test.output"
        codec => rubydebug { metadata => true }
    }
}

In this case, with a simple gsub => ["message"," ","?"] we got as a result:

{
    "path" => "/tmp/test.input",
    "@timestamp" => 2019-09-18T15:32:00.644Z,
    "@metadata" => {
        "path" => "/tmp/test.input",
        "host" => "..."
    },
    "host" => "...",
    "@version" => "1",
    "message" => "module=SCM?fa=PP?at=SCM.PP.GENERATE?si=E02F07592081B6A58F7FFC3DE035D62B?ci=JShenSCM?cn=GRIFOLS,?S.A.?cs=SCM_JShenSCM.?pi=dbPool1?ui=new_user_id?locale=en_US?ktf1=[EM,EX,EH]?if1=2950?ktf3=3x3?ktf4=TEAMVIEW?ktf5=MYSELF?ktf6=ALL?ktf8=[gender,department,division,location]?if2=0?if3=1?if4=0"
}

@cheriemilk, you didn't explain the question very well. The kv filter, as I know from the other question I answered, is not parsing a field like "cn=Deutsche Bank" correctly when field_split is set to " ".

You can fix this using gsub, as you suggest. You need to replace space with question mark (or something else), unless it is followed by a key and =. That is called a negative lookahead assertion, and it looks like this:

mutate { gsub => [ "message", " (?![a-z0-9]+=)", "?" ] }

Hi @abrx
This is not I expected parsing result by logstash as the configuration replaced the space with '?' as well in cn values. I don't want spaces in cn values replaced.

Yes. I am locally trying to make the gsub working. Just thinking how to write the regex in gsub to make the replacement happens in right places

Hi @Badger
gsub is definitely a way to resolve this. the diffuculity is how to to write the correct regex to make sure it replace the expected spaces.

Just tried with mutate { gsub => [ "message", " (?![a-z0-9]+=)", "?" ] }, and it replaces all the spaces in the message after local testing, but I want to keep the spaces in the values string of cn field.

Using

input { generator { count => 1 lines => [ 'module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche Bank cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0' ] } }
filter {
    mutate { gsub => [ "message", " (?![a-z0-9]+=)", "?" ] }
}

I get

   "message" => "module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche?Bank cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0",

What do you get?

Hi @Badger
I get the same result with you yesterday. But it's not expected for cn=Deutsche?Bank. The expected parse result is cn=Deutsche Bank, the space in the value should be kept, and only replace the spaces with '?' for those field splitter.

Thanks,
Cherie

@Badger

I updated gsub to mutate { gsub => [ "message", "(\s)\w+=", "?" ] } and in regex101, it matched only the spaces that is the spliter of field. check here https://regex101.com/r/ygsL8o/1

But when I configure it in logstash, I observe that the result is still unexpected.

module=SCM?PP?SCM.PP.GENERATE?E02F07592081B6A58F7FFC3DE035D62B?JShenSCM?Deutsche Telekom AG?SCM_JShenSCM.?dbPool1?new_user_id?en_US?[]?2950?3x3?TEAMVIEW?MYSELF?ALL?[gender,department,division,location]?0?1?0\r

It replace the whole regex pattern (\s)\w+= with '?', but shouldn't the \s in the parenthesis be replaced for gsub?

Thanks,
Cherie

The negative lookahead tells it to only substitute the space in Deutsche Bank. If you want to substitute every space except that one then use a positive lookahead.

mutate { gsub => [ "message", " (?=[a-z0-9]+=)", "?" ] }
1 Like

@Badger. Thanks for the solution