Issue with gsub and regex not affecting the resulting strings


I'm attempting to remove specific items of a return string so that I have accurate aggregated filetypes within elasticsearch. The general format of the TrID output is this:

34.2% (.DLL) Win32 Dynamic Link Library (generic)
64.5% (.EXE) Win32 Executable MS Visual C++ (generic)
100% (.EXE) DOS Executable Generic
49.9% (.EXE) Generic Win/DOS Executable
33.9% (.EXE) Win 16/32 Executable Delphi Generic
81.9% (.EXE) Generic CLI Executable (.NET,Mono,etc)
79.7% (.EXE) Win32 EXE PECompact compressed (generic)
53.1% (.EXE) Win32 EXE PECompact compressed (v2.x)

I have a regex string to remove the beginning numerical percentages and the ending '(generic)' strings but I'm not getting results. This is in my .conf file for logstash. The field 'trid' is being pulled through a JDBC connection/query if that matters.

Filter code:

input {
   jdbc {
      connection info ....
   statement => "select id, name, ......, trid, ....."
filter {
  mutate {
      gsub => [ "trid", "/\d+(\.\d+)*%\s(.*)(\s\(generic\))*$/gmU", "" ]
output {
    elasticsearch {
       hosts => [ "host"]
       index => ["index"]
       stdout {}

What am I doing wrong here?

The second string in the array is converted to a Regexp, you cannot use // or modifiers. If you remove those it matches, and replaces the entire field with "". If you reference a capture group, by changing the third entry to "\2" then it still does not work, because the (.*) captures the whole of the rest of the field, since the (generic) is optional. Use a second trio

mutate {
    gsub => [ 
        "message", "^\d+(\.\d+)*% ", "",
        "message", "\s\(generic\)$", ""
1 Like

I implemented your solution but there still isn't any change to the resulting output.


Did you modify my mutate to update [trid] rather than [message]?

1 Like

That...that was it :man_facepalming: It works now. Thank you so much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.