[logstash]what does "\1" mean in gsub?

cheriemilk · September 18, 2019, 8:04am

Hi team,

what does "\1" mean in below configuration and how to use it? I can;t find the official documentation which can explain it.

mutate {
  gsub => ["message","(?<=SERVICEPERFDATA::)procs=(\d+);\S+", "\1"]
}

Is below configuraiton valid
`

gsub => ["message", "(\S+=)", ", \1"]

`

Thanks,
Cherie

abrx · September 18, 2019, 8:26am

Hi!
Gsub is kinka like sed in bash.
In the second member "(?<=SERVICEPERFDATA::)procs=(\d+);\S+" (the one you want to replace) you define two blocks with parenthesis, and you can call them in the third member (the think you want at the end) with \1 ,\2 and so on.
Your last configuration is wrong cause each array of gsub need 3 elements, you shoud check the doc here.

cheriemilk · September 18, 2019, 9:43am

Thank you abrx.

I have below messages, and I want to replace the space (the kv pair split pattern only, not include the spaces in the value like cn=Deutsche Telekom AG). If I want to replace the space with '?'.

Is this configuration correct?
`

gsub => ["message", "\S+(?\s)", "?"]

`

module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=GRIFOLS, S.A. cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EH] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche Telekom AG cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[gender,department,division,location] if2=0 if3=1 if4=0
module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche Bank cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0

abrx · September 18, 2019, 3:42pm

Well you should definitly build a test pipeline instead of guessing.

Here's some sample logstash configuration you could use, according that sample data is in /tmp/test.input and result will be written in /tmp/test.output with debug mode on.
(This works with mostly all recent versions as it's pretty simple).

input {
    file {
        path => "/tmp/test.input"
        start_position => "beginning"
    }
}
filter {
    mutate {
        gsub => ["message"," ","?"]
    }
}
output {
    file {
        path => "/tmp/test.output"
        codec => rubydebug { metadata => true }
    }
}

In this case, with a simple gsub => ["message"," ","?"] we got as a result:

{
    "path" => "/tmp/test.input",
    "@timestamp" => 2019-09-18T15:32:00.644Z,
    "@metadata" => {
        "path" => "/tmp/test.input",
        "host" => "..."
    },
    "host" => "...",
    "@version" => "1",
    "message" => "module=SCM?fa=PP?at=SCM.PP.GENERATE?si=E02F07592081B6A58F7FFC3DE035D62B?ci=JShenSCM?cn=GRIFOLS,?S.A.?cs=SCM_JShenSCM.?pi=dbPool1?ui=new_user_id?locale=en_US?ktf1=[EM,EX,EH]?if1=2950?ktf3=3x3?ktf4=TEAMVIEW?ktf5=MYSELF?ktf6=ALL?ktf8=[gender,department,division,location]?if2=0?if3=1?if4=0"
}

Badger · September 18, 2019, 4:33pm

@cheriemilk, you didn't explain the question very well. The kv filter, as I know from the other question I answered, is not parsing a field like "cn=Deutsche Bank" correctly when field_split is set to " ".

You can fix this using gsub, as you suggest. You need to replace space with question mark (or something else), unless it is followed by a key and =. That is called a negative lookahead assertion, and it looks like this:

mutate { gsub => [ "message", " (?![a-z0-9]+=)", "?" ] }

cheriemilk · September 19, 2019, 8:05am

Hi @abrx
This is not I expected parsing result by logstash as the configuration replaced the space with '?' as well in cn values. I don't want spaces in cn values replaced.

Yes. I am locally trying to make the gsub working. Just thinking how to write the regex in gsub to make the replacement happens in right places

cheriemilk · September 19, 2019, 8:08am

Hi @Badger
gsub is definitely a way to resolve this. the diffuculity is how to to write the correct regex to make sure it replace the expected spaces.

Just tried with mutate { gsub => [ "message", " (?![a-z0-9]+=)", "?" ] }, and it replaces all the spaces in the message after local testing, but I want to keep the spaces in the values string of cn field.

Badger · September 19, 2019, 2:16pm

Using

input { generator { count => 1 lines => [ 'module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche Bank cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0' ] } }
filter {
    mutate { gsub => [ "message", " (?![a-z0-9]+=)", "?" ] }
}

I get

   "message" => "module=SCM fa=PP at=SCM.PP.GENERATE si=E02F07592081B6A58F7FFC3DE035D62B ci=JShenSCM cn=Deutsche?Bank cs=SCM_JShenSCM. pi=dbPool1 ui=new_user_id locale=en_US ktf1=[EM,EX,EMM] if1=2950 ktf3=3x3 ktf4=TEAMVIEW ktf5=MYSELF ktf6=ALL ktf8=[] if2=0 if3=1 if4=0",

What do you get?

cheriemilk · September 20, 2019, 1:37am

Hi @Badger
I get the same result with you yesterday. But it's not expected for cn=Deutsche?Bank. The expected parse result is cn=Deutsche Bank, the space in the value should be kept, and only replace the spaces with '?' for those field splitter.

Thanks,
Cherie

cheriemilk · September 20, 2019, 3:11am

@Badger

I updated gsub to mutate { gsub => [ "message", "(\s)\w+=", "?" ] } and in regex101, it matched only the spaces that is the spliter of field. check here https://regex101.com/r/ygsL8o/1

But when I configure it in logstash, I observe that the result is still unexpected.

module=SCM?PP?SCM.PP.GENERATE?E02F07592081B6A58F7FFC3DE035D62B?JShenSCM?Deutsche Telekom AG?SCM_JShenSCM.?dbPool1?new_user_id?en_US?[]?2950?3x3?TEAMVIEW?MYSELF?ALL?[gender,department,division,location]?0?1?0\r

It replace the whole regex pattern (\s)\w+= with '?', but shouldn't the \s in the parenthesis be replaced for gsub?

Thanks,
Cherie

Badger · September 20, 2019, 4:57pm

The negative lookahead tells it to only substitute the space in Deutsche Bank. If you want to substitute every space except that one then use a positive lookahead.

mutate { gsub => [ "message", " (?=[a-z0-9]+=)", "?" ] }

cheriemilk · September 26, 2019, 5:41am

@Badger. Thanks for the solution

system · October 24, 2019, 5:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using gsub to replace an unknown amount of multiple white space to single whitespace Logstash	3	3019	October 25, 2019
Mutate gsub regex pattern help Logstash	7	8829	February 1, 2021
Proper RegEx with Mutate>GSub Logstash	2	3359	May 22, 2020
Help needed for reading Logs Logstash	10	1015	December 16, 2016
Small Help with gsub command to replace a special character with space Logstash	1	470	June 16, 2017

[logstash]what does "\1" mean in gsub?

Related topics