Issue with Parsing multiline log in together with filebeat multiline config and logstash

I have the below multiline log:

2020-05-27 11:59:17 ----------------------------------------------------------------------
2020-05-27 11:59:17 Got context
2020-05-27 11:59:17 Raw context:
                    EMAIL=abc.def@example.com
                    NAME=abc.def
                    PAGER=+11111111111111
                    DATE=2020-05-27
                    AUTHOR=
                    COMMENT=
                    ADDRESS=1.1.1.1
                    ALIAS=abc.example.com
                    ATTEMPT=1
2020-05-27 11:59:17 Previous service hard state not known. Allowing all states.
2020-05-27 11:59:17 Computed variables:
                    URL=abc.example.com
                    STATE=UP                
2020-05-27 11:59:17 Preparing flexible notifications for abc.def
2020-05-27 11:59:17  channel with plugin sms
2020-05-27 11:59:17  - Skipping: set
2020-05-27 11:59:17  channel with plugin plain email        
2020-05-27 11:59:20 --------------------------------------------------------------------

This is my logstash config:

    input {
      stdin { }
    }

    filter {

            grok {
                match => { "message" => "(?m)%{GREEDYDATA:data}"}
            }
            if [data] {
                  mutate {
                     gsub => [
                         "data", "^\s*", ""
                     ]
                  }
                  mutate {
                      gsub => ['data', "\n", " "]
                  }
             }
}

    output {
      stdout { codec => rubydebug }
    }

And Filebeat config

  multiline.pattern: '^[[:space:]][A-Za-z]* (?m)'
  multiline.negate: false
  multiline.match: after

What I need is -
After the multiline log has been processed by filebeat, the log will get split into multiple lines. One such line is

    2020-05-27 11:59:17 Raw notification context:
                        EMAIL=abc.def@example.com
                        NAME=abc.def
                        PAGER=+11111111111111
                        DATE=2020-05-27
                        AUTHOR=
                        COMMENT=
                        ADDRESS=1.1.1.1
                        ALIAS=abc.example.com
                        ATTEMPT=1

Now, I want to extract each key value pair using the kv filter and it should show as like one single message like the above but then also have query_string_key=value assigned to it.

How to achieve this ?

I do not know filebeat well, but that regexp for the multiline looks wrong to me.

Anyways, assuming filebeat delivers a [message] like

"2020-05-27 11:59:17 Raw context:\n                    EMAIL=abc.def@example.com\n                    NAME=abc.def\n                    PAGER=+11111111111111\n                    DATE=2020-05-27\n                    AUTHOR=\n                    COMMENT=\n                    ADDRESS=1.1.1.1\n                    ALIAS=abc.example.com\n                    ATTEMPT=1"

Your filters will reduce that to

"2020-05-27 11:59:17 Raw context: EMAIL=abc.def@example.com NAME=abc.def PAGER=+11111111111111 DATE=2020-05-27 AUTHOR= COMMENT= ADDRESS=1.1.1.1 ALIAS=abc.example.com ATTEMPT=1"

Using grok to copy the message field to another field is expensive. It would be cheaper to do

mutate { copy => { "message" => "data"}

Also, the two mutate filters can be combined

                 gsub => [
                     "data", "^\s*", "",
                     'data', "\n", " "
                 ]

If you just add

kv { source => "data" }

then you will get additional fields

      "DATE" => "2020-05-27",
     "EMAIL" => "abc.def@example.com",
   "ATTEMPT" => "1",
     "ALIAS" => "abc.example.com",
     "PAGER" => "+11111111111111",
      "NAME" => "abc.def",
    "AUTHOR" => "COMMENT=",
   "ADDRESS" => "1.1.1.1"

Note that two consecutive keys that do not have values are not well handled.

Thanks a lot for your response. I am a newbie in ELK.
In Kibana, using the logstash and filebeat config that I posted, I get 10 hits(results) with the data field like below:

t  data 2020-05-27 18:06:20 --------------------------------------------------------------------
t  data 2020-05-27 18:06:17 Previous service hard state not known. Allowing all states.
t  data	2020-05-27 18:06:17 Computed variables:
                    URL=abc.example.com
                    STATE=UP    
t  data 2020-05-27 18:06:17 Preparing flexible notifications for abc.def
t  data 2020-05-27 18:06:17 channel with plugin sms
t  data 2020-05-27 18:06:17 - Skipping: set
t  data 2020-05-27 18:06:17 channel with plugin plain email
t  data 2020-05-27 18:06:17 ----------------------------------------------------------------------
t  data 2020-05-27 18:06:17 Got context
t  data	2020-05-27 18:06:17 Raw context:
                    EMAIL=abc.def@example.com
                    NAME=abc.def
                PAGER=+11111111111111
                DATE=2020-05-27
                AUTHOR=
                COMMENT=
                ADDRESS=1.1.1.1
                ALIAS=abc.example.com
                ATTEMPT=1

With the Logstash filter config, that you have suggested -

filter {
            mutate {
                copy => { "message" => "data"}
                gsub => [
                    "data", "^\s*", "",
                    "data", "\n", " "
                ]
            }
            kv { source => "data" }
    }

I have not tried it yet, but I would like to know if our below assumption of the message that filebeat delivers is correct or not based on the above output in Kibana. How can I check that ?

"2020-05-27 11:59:17 Raw context:\n                    EMAIL=abc.def@example.com\n                    NAME=abc.def\n                    PAGER=+11111111111111\n                    DATE=2020-05-27\n                    AUTHOR=\n                    COMMENT=\n                    ADDRESS=1.1.1.1\n                    ALIAS=abc.example.com\n                    ATTEMPT=1"

Look at the message and data fields in kibana (expand an event in the discover tab and look at the JSON tab).

Thanks a lot for this tip about expanding the JSON. I tried the grok filter and I get the results perfectly fine.

Is there a way to ignore the below data as it does not contain useful key value pairs when processed by the KV filter so that this does not appear in Kibana:

t  data 2020-05-27 18:06:20 --------------------------------------------------------------------
t  data 2020-05-27 18:06:17 Previous service hard state not known. Allowing all states.
t  data 2020-05-27 18:06:17 Preparing flexible notifications for abc.def
t  data 2020-05-27 18:06:17 channel with plugin sms
t  data 2020-05-27 18:06:17 - Skipping: set
t  data 2020-05-27 18:06:17 channel with plugin plain email
t  data 2020-05-27 18:06:17 ----------------------------------------------------------------------
t  data 2020-05-27 18:06:17 Got context

Those lines are single lines, not multiple lines combined. The multiline codec adds a tag (set by the multiline_tag option) to events only if they are a combination of multiple lines. So if you want those lines to disappear you could use

if "multiline" not in [tags] { drop {} }

Thanks a lot. All works. Regarding one of the values that I get for the key.

For example, this is the Filebeat message:

ADDRESS=1.1.1.1\n                    NAME=abc.def\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\n                    ATTEMPT=3\n                  

For the key STATE, I only see "wan" getting displayed and rest all the values are ignored. How to get all the values for this key as it is ?Do I have to split here?

I tried like this but no luck:

    kv {
        source => "data"
        field_split => "\n\s*"
        value_split => "="
       }

Am i missing something ?

Change your gsub to put double quotes around the value for each key...

               gsub => [
                   "data", "^\s*", "",
                   "data", "(=)([^\n]*)\n", '\1"\2" ',
                   'data', "\n", " "
               ]

The third is still required to handle the junk at the start of the data field. This improves the result because the kv filter ignores fields with empty values, so

AUTHOR=\"\" COMMENT=\"\"

no longer results in that bogus AUTHOR field on the event.

Sorry, but the value of the key STATE is still "wan"

    {
              "host" => "pqr",
            "AUTHOR" => "\\n",
             "STATE" => "wan",
             "ALIAS" => "abc.example.com\\n",
           "ATTEMPT" => "1",
             "PAGER" => "+11111111111111\\n",
              "data" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    NAME=abc.def\\n                    PAGER=+11111111111111\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ADDRESS=1.1.1.1\\n                    ALIAS=abc.example.com\\n                    ATTEMPT=1",
             "EMAIL" => "abc.def@example.com\\n",
           "COMMENT" => "\\n",
        "@timestamp" => 2020-05-27T19:18:57.370Z,
           "message" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    NAME=abc.def\\n                    PAGER=+11111111111111\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ADDRESS=1.1.1.1\\n                    ALIAS=abc.example.com\\n                    ATTEMPT=1",
              "DATE" => "2020-05-27\\n",
              "NAME" => "abc.def\\n",
           "ADDRESS" => "1.1.1.1\\n",
          "@version" => "1"
    }

This is the updated filter:

filter {
        mutate {
            copy => { "message" => "data"}
            gsub => [
                "data", "\s*", "",
                "data", "(=)([^\n]*)\n", '\1"\2" ',
                'data', "\n", " "
            ]
        }
        kv {
            source => "data"
            field_split => "\n\s*"
            value_split => "="
           }
}

Change the kv back to

kv { source => "data" }

Still it remains the same.

{
       "COMMENT" => "\\n",
      "@version" => "1",
          "host" => "pqr",
         "ALIAS" => "abc.example.com\\n",
          "NAME" => "abc.def\\n",
    "@timestamp" => 2020-05-27T19:45:50.644Z,
         "STATE" => "wan",
        "AUTHOR" => "\\n",
         "EMAIL" => "abc.def@example.com\\n",
       "ATTEMPT" => "1",
       "ADDRESS" => "1.1.1.1\\n",
       "message" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    NAME=abc.def\\n
  PAGER=+11111111111111\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots
/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ADDRESS=1.1.1.1\\n
          ALIAS=abc.example.com\\n                    ATTEMPT=1",
          "DATE" => "2020-05-27\\n",
          "data" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    NAME=abc.def\\n
  PAGER=+11111111111111\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots
/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ADDRESS=1.1.1.1\\n
          ALIAS=abc.example.com\\n                    ATTEMPT=1",
         "PAGER" => "+11111111111111\\n"
}

Your message contains \n in several places. That not a newline, it is the character \ followed by the character n. Changing \n in the gsubs to \\n might help.

filter {
        mutate {
            copy => { "message" => "data"}
            gsub => [
                "data", "\s*", "",
                "data", "(=)([^\\n]*)\\n", '\1"\2" ',
                'data', "\\n", " "

            ]
        }
        kv {
            source => "data"
           }
}

Did this change in the filter without any luck .

Oh, that's not going to work. The intent of the pattern was "characters that are not newline followed by newline". For a pair of characters you will have to change it to a lookahead assertion. I cannot test it right now, but maybe

"data", "(=)(.*)(?=\\n)", '\1"\2" ',

Hopefully the .* will not get greedy.

Sorry. It doesn't work.

It was not working because for some fields the .* was getting greedy. How about this?

gsub => [
    "data", "(=)([^=]*)(?=\\n)", '\1"\2"',
    "data", "\\n\s*", " "
]

I used this now and the issue remains the same:

filter {
        mutate {
            copy => { "message" => "data"}
            gsub => [
                "data", "(=)([^=]*)(?=\\n)", '\1"\2"',
                "data", "\\n\s*", " "
            ]
        }
        kv {
            source => "data"
           }
}

The problem I have is that the rubydebug [message] field you show appears to have both \\n and unescaped newlines. I have no idea how rubydebug could possibly display that, so I do not know the value of your [message] field. I cannot help.

Are you referring to this?

 "message" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    NAME=abc.def\\n                    PAGER=+11111111111111\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ADDRESS=1.1.1.1\\n                    ALIAS=abc.example.com\\n                    ATTEMPT=1",

No, if you go 8 posts back from your post you will see a post that contains the rubydebug of a complete event. That has a newline embedded just before PAGER and others further on.

I am running out of ideas. This configuration

input { generator { count => 1 lines => [ "2020-05-27 11:59:17 Raw context:\n                    EMAIL=abc.def@example.com\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\n                    DATE=2020-05-27\n                    AUTHOR=\n                    COMMENT=\n                    ATTEMPT=1" ] } }
filter {
        mutate { copy => { "message" => "data"}
        }
        if [data] {
            mutate {
               gsub => [
                   "data", "(=)([^=]*)(?=\\n)", '\1"\2"',
                   "data", "\\n\s*", " "
               ]
            }
            kv { source => "data" }
        }
}
output { stdout { codec => rubydebug { metadata => false } } }

produces

   "message" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ATTEMPT=1",
      "data" => "2020-05-27 11:59:17 Raw context: EMAIL=\"abc.def@example.com\" STATE=\"wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\" DATE=\"2020-05-27\" AUTHOR=\"\" COMMENT=\"\" ATTEMPT=1",
     "STATE" => "wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/",
      "DATE" => "2020-05-27",
     "EMAIL" => "abc.def@example.com",
   "ATTEMPT" => "1"

I had to remove a couple of fields from the original message for that example because otherwise the browser introduces the same unescaped newlines that were in your post. However, even if I use the [message] value you gave then it works...

   "message" => "2020-05-27 11:59:17 Raw context:\\n                    EMAIL=abc.def@example.com\\n                    NAME=abc.def\\n                    PAGER=+11111111111111\\n                    STATE=wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\\n                    DATE=2020-05-27\\n                    AUTHOR=\\n                    COMMENT=\\n                    ADDRESS=1.1.1.1\\n                    ALIAS=abc.example.com\\n                    ATTEMPT=1",
      "data" => "2020-05-27 11:59:17 Raw context: EMAIL=\"abc.def@example.com\" NAME=\"abc.def\" PAGER=\"+11111111111111\" STATE=\"wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/\" DATE=\"2020-05-27\" AUTHOR=\"\" COMMENT=\"\" ADDRESS=\"1.1.1.1\" ALIAS=\"abc.example.com\" ATTEMPT=1",
   "ADDRESS" => "1.1.1.1",
     "ALIAS" => "abc.example.com",
   "ATTEMPT" => "1",
     "PAGER" => "+11111111111111",
      "NAME" => "abc.def",
     "STATE" => "wan abc site:def cake-agent jlp bl3 mississipi algebra dev app abc1 nano /nano/ack/divgen/mississipi/bots/linux/",
      "DATE" => "2020-05-27",
     "EMAIL" => "abc.def@example.com"