How to replace 'special characters' with a logstash filter

(Eric) #1


I use the IMAP input to retrieve some e-mail (generated by a Windows application) and I can't find how to manipulate the data to do what I want.

E-mails looks like that (simplified and anonymized):

Virus/malware : LNK_DUNIHI.SMIX
Computer : HOSTNAME
IP address : aaa.bbb.ccc.ddd
Field1 : Some\Information\Separated\By-Slash\
File : F:\Path to\a\specific file
Date/Time : 27/08/2015 15:48:38
Result : Some description text

But Logstash sees them like that (at least this is what I see if I retrieve them with just an input/file and output/file):

Virus/malware : LNK_DUNIHI.SMIX\r\nComputer : HOSTNAME\r\nIP address : aaa.bbb.ccc.ddd\r\nField1 : Some\\Information\\Separated\\By-Slash\\\r\nFile : F:\\Path to\\a\\specific file\r\nDate/Time : 27/08/2015 15:48:38\r\nResult : Some description text

-> all newlines are converted to '\r\n' and all '' to '\\'

I try to make the message easy to read and grok (this is the initial need but '\\' causes problem) so I want to replace all '\r\n' to something else and '\\' to ''. The first one is ok but the second one not. I use a mutate/gsub filter like this:

mutate {
    gsub => [
        "message", "\r\n", "X",
        "message", "\\", "Y"

The first gsub expression works, the second one never, I got the following message:

Error: Expected one of #, {, ,, ] at line 23, column 31 (byte 542) after filter {
    mutate {
        gsub => [
            "message", "\r\n", "X",
            "message", "\\", "

Maybe there is a syntax error but I don't find it, maybe there is a problem with special characters like \ which needs to be protected by another \ but I tried putting '\\' with the same error.

Anyone can help me with this?


It seems like logstash has some issues, when it comes to escaping things. See:


for more information.

According to: the following should replace backslashes, question marks, hashes and minuses:

filter {
  mutate {
    gsub => [
      # replace backslashes, question marks, hashes, and minuses
      # with a dot "."
      "fieldname2", "[\\?#-]", "."

Maybe it works, if you use "[\]" ?

(Eric) #3

Thanks for this tip, even if it don't work in my case.
I finally kept the first mutate/gsub filter for '\r\n' and enhanced my grok filter to better parse the fields. For now, I'll keep my double backslash as-is.

(system) #4