KV Filtering data between square brackets

Hi,

How can i parse data which has two layers of square brackets.

  1. Inner square brackets
  2. Outer square brackets

I only want to consider outer square brackets data as key value pairs.

Example : [filename=0_578cc[R2][]2 veckor f=f600re (2).doc]

In the above example i want to take filename as key but it is also considering the inner square bracket which is breaking my logic.
I am using grok and kv filter with the following config.

match => [ "message", "%{SYSLOGTIMESTAMP:eventtime}\s(?[^\s])\s(?[^\s])\s(?[^\s]*)\s%{GREEDYDATA:msg}"]

kv {
source => "msg"
field_split => "]["
value_split => "="
}

Can anyone suggest me what needs to be done.

Can you provide more examples inputs, and explicitly what you hope to extract from each? It's really hard to come up with a generic pattern that will work without clearly knowing what you want.

Hi, Thanks for your reply. Below is one of the classic example.

Oct 3 20:13:48 WIN-C6MQ3RMMBL IndexerI103 [event-type=indexation][guid=D2B636B35A54433AB2916FEA4D180538][filename=0_1d1ea[**R4][**]2011-05-25 Rumsf=f600rdelning Byggnad_Plan_Rum.xls][fileext=xls][source=Internal][size=118784][converter=_FilenameToHtmlNoBlob][success=True][message=ok][durationExecution=328][durationConversion=0][durationExts=0][durationLemma=16][durationIndexPacket=0][durationCache=16]

From this input if you see the key 'filename' has value which includes "][". This is breaking my logic of getting kvpairs using "][" as field split.

I cannot tell what you expect to extract from this data; can you provide a mapping of what keys you expect to extract, and what you expect the values to be, exactly?


There was recently a new release of the kv filter plugin, which allows us to specify a pattern for the field-splitter and value-splitter; the following may work, but it will not be especially performant because it will need to do a lot of backtracking in order to capture the right bits:

bin/logstash-plugin update logstash-filter-kv

Once you have done so, we can define the pattern to split fields on one of the following:

  • the start of a string followed by an open-square-bracket ^\[ (cheap); OR
  • a close-square-bracket followed by the end-of-line \]$ (cheap); OR
  • a close-suare-bracket and open-square-bracket that is followed by something that looks like a key \]\[(?=[A-Za-z0-9]+=)) (expensive; may need to backtrack)

Put it together, and we get:

filter {
  kv {
    field_split_pattern => "(?:^\[|\]$|\]\[(?=[A-Za-z0-9]+=))"
  }
}

With the above pattern, I get:

{
                 "source" => "Internal",
                   "host" => "castrovel.local",
                "success" => "True",
     "durationConversion" => "0",
                   "size" => "118784",
      "durationExecution" => "328",
              "converter" => "_FilenameToHtmlNoBlob",
             "@timestamp" => 2018-03-29T18:43:20.270Z,
                   "guid" => "D2B636B35A54433AB2916FEA4D180538",
                "fileext" => "xls",
           "durationExts" => "0",
          "durationLemma" => "16",
             "event-type" => "indexation",
    "durationIndexPacket" => "0",
               "@version" => "1",
                "message" => "ok",
          "durationCache" => "16",
               "filename" => "0_1d1ea[**R4][**]2011-05-25 Rumsf=f600rdelning Byggnad_Plan_Rum.xls"
}

Hi,

Thanks. This is working for some of the log, although using this is not getting me "user-id" in the below log.

Sep 2 14:32:57 WIN-5KCJEHGCVCM MyApp2 [event-type=search.text][guid=194B71D7E84A4AF6B9C21CBFB60E9851][user-id=ad|S-1-5-21-1292428093-776561741-11674531-9345][profile=rdSearch][session-id=E908D509BEF44A91AA1489ACC5C49461][duration=594][result-id=B4D9BDFD3D744E46A1BC75729788F51D][result-count=78043][text=test]

user-id is only key that contains a hyphen. Change

field_split_pattern => "(?:^\[|\]$|\]\[(?=[A-Za-z0-9]+=))"

to

field_split_pattern => "(?:^\[|\]$|\]\[(?=[-A-Za-z0-9]+=))"
1 Like

@Badger, Thank you. This worked well.

@yaauie, Thank you mate.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.