Hello,
I'm attempting to remove specific items of a return string so that I have accurate aggregated filetypes within elasticsearch. The general format of the TrID output is this:
34.2% (.DLL) Win32 Dynamic Link Library (generic)
64.5% (.EXE) Win32 Executable MS Visual C++ (generic)
100% (.EXE) DOS Executable Generic
49.9% (.EXE) Generic Win/DOS Executable
33.9% (.EXE) Win 16/32 Executable Delphi Generic
81.9% (.EXE) Generic CLI Executable (.NET,Mono,etc)
79.7% (.EXE) Win32 EXE PECompact compressed (generic)
53.1% (.EXE) Win32 EXE PECompact compressed (v2.x)
I have a regex string to remove the beginning numerical percentages and the ending '(generic)' strings but I'm not getting results. This is in my .conf file for logstash. The field 'trid' is being pulled through a JDBC connection/query if that matters.
Filter code:
input {
jdbc {
connection info ....
statement => "select id, name, ......, trid, ....."
}
filter {
mutate {
gsub => [ "trid", "/\d+(\.\d+)*%\s(.*)(\s\(generic\))*$/gmU", "" ]
}
output {
elasticsearch {
hosts => [ "host"]
index => ["index"]
}
stdout {}
}
What am I doing wrong here?