Back references in mutate gsub?

I'm in the middle of implementing a PostgreSQL log parser that will normalize the queries so I can do stats on them. As part of this, I need to be able to search for components of fields and replace them with back references plus some extra text. Example of one such line (replacement string contains a \1, which in Ruby at least should bring in the element found within the () in the search string::

mutate {
gsub => [
"normalized_query" , "([^a-z_$-])-?([0-9]+)", "\10"
]
}

Question 1: Does this work? I am not going to get a chance to test it today, but hoped someone could chime in on this in case my simple-minded belief that a Ruby back-reference would work. I hope it does.

The other question is, if you have a bunch of gsubs in the same mutate function, and they all operate on the same field, will the changes be applied in serial order? I didn't see anything in the docs about this. I assume it's the case, but thought I'd ask.

Thanks!

And since I did get a chance to test this, this morning, let me share what I found out.

Back references do indeed work in a gsub mutate statement.

Example: Reverse the two search patterns separated by a period character. Notice that the replacement string HAS to be delimited by apostrophe characters, not double quotes.

270 mutate {
271 gsub => [
272 # Back reference test
273 "duplicate_syslog_program", "^([^.]+).(.*)$", '\2.\1'
275 ]
276 }

And yes, if you have multiple gsub array elements defined, they are in fact applied in serial order to the mutated field.

Also, yes, all the elements of the gsub array are applied in serial order, so you can edit the hell out of the same field in a consistent way within a single mutate block.

However, if you need to perform some other mutate action in between a pair of gsubs, such as lowercase, you have to put them into separate mutate blocks. At least as of Logstash 2.1.

1 Like