DISSECT FILTER, field not always specified

I was using a grok filter until i had the " Timeout executing grok : Value too large to output" message.
In order to solve this issue, I would like to try a dissect filter.
I have big files and sometimes, some field are not specified. In the grok pattern, I am used to use : (%{MYPATTERN: myFIELD})? . Whats the alternative in the dissect filter?

Thank you for you answers.

Regards

What do your lines look like?

Show me an example of each of the different lines.

I might suggest a grok/dissect combo or dissect only.

Hi,

A line look like

4/28/2017 4:34:19 AM True Bed 43 00589614_20170428_10 U378ERTYT U378ERTYT=LLD 8 OutOf Ikea OIP 0 0.0987411235 0..0987411238498 1.09874112 1.0987411299 0 0..098741125753 0.0.09874112835753 -0.0.0987411218 0..0987411223 0..09874112624 0..098741123 0..09874112913 0.907499999999595 1..098741124 0.0.09874112323 0.3.098741124 0.6.09874112193 0..098741123 0..0987411295 1..0987411214 OutOf U378ERTYT NaN 0 NANA

In the line after, some field must be unspecified. As :
4/22/2017 4:48:19 AM True Bed 43 00581554_58956528_10 UERT25YT UERTY25T=LLD OutOf Ikea OIP 0 0.0987411235 0..0987411238498 1.09874112 1.0987411299 0 0..098741125753 0.0.09874112835753 -0.0.0987411218 0..0987411223 0..09874112624 0..098741123 0..09874112913 0.907499999999595 0.0.09874112323 0.3.098741124 0..098741123 0..0987411295 1..0987411214 OutOf U378ERTYT NaN 0 NANA

Thank you for your help

Hi,

First of all, thank you for your interest.
I just want to know, if you have an idea for my issue?

Regards

The kind of multi stage config would look something like this.

input {
  stdin {}
}

filter {
  dissect {
    mapping => {
      message => "%{left}	OutOf	Ikea	%{middle}"
    }
  }
  dissect {
    mapping => {
      left => "%{date}	%{fTrue}	%{fBed}	%{f43}	%{f00589614_20170428_10}	%{fU378ERTYT}	%{fU378ERTYT=LLD}	%{f8}"
      middle => "%{togrok}	OutOf	%{right}"
    }
  }
  dissect {
    mapping => {
      right => "%{fU378ERTYT}	%{fNaN}	%{f0}	%{fNANA}"
    }
  }
}

output {
  stdout { codec => rubydebug }
}

You need to use an editor that does not auto-convert tabs into spaces.
This gives this output...

4/22/2017 4:48:19 AM	True	Bed	43	00581554_58956528_10	UERT25YT	UERTY25T=LLD	OutOf	Ikea	OIP	0 0.0987411235	0..0987411238498	1.09874112	1.0987411299	0	0..098741125753	0.0.09874112835753 -0.0.0987411218	0..0987411223	0..09874112624	0..098741123	0..09874112913	0.907499999999595 0.0.09874112323	0.3.098741124	0..098741123	0..0987411295	1..0987411214	OutOf	U378ERTYT	NaN	0 NANA
{
                     "date" => "4/22/2017 4:48:19 AM",
                   "middle" => "OIP\t0 0.0987411235\t0..0987411238498\t1.09874112\t1.0987411299\t0\t0..098741125753\t0.0.09874112835753 -0.0.0987411218\t0..0987411223\t0..09874112624\t0..098741123\t0..09874112913\t0.907499999999595 0.0.09874112323\t0.3.098741124\t0..098741123\t0..0987411295\t1..0987411214\tOutOf\tU378ERTYT\tNaN\t0 NANA",
                      "f43" => "43",
                       "f0" => "",
                    "right" => "U378ERTYT\tNaN\t0 NANA",
                  "message" => "4/22/2017 4:48:19 AM\tTrue\tBed\t43\t00581554_58956528_10\tUERT25YT\tUERTY25T=LLD\tOutOf\tIkea\tOIP\t0 0.0987411235\t0..0987411238498\t1.09874112\t1.0987411299\t0\t0..098741125753\t0.0.09874112835753 -0.0.0987411218\t0..0987411223\t0..09874112624\t0..098741123\t0..09874112913\t0.907499999999595 0.0.09874112323\t0.3.098741124\t0..098741123\t0..0987411295\t1..0987411214\tOutOf\tU378ERTYT\tNaN\t0 NANA",
               "fU378ERTYT" => "U378ERTYT",
                    "fNANA" => "0 NANA",
                    "fTrue" => "True",
                       "f8" => "UERTY25T=LLD",
               "@timestamp" => 2017-06-12T10:40:19.035Z,
                     "fBed" => "Bed",
           "fU378ERTYT=LLD" => "",
                     "left" => "4/22/2017 4:48:19 AM\tTrue\tBed\t43\t00581554_58956528_10\tUERT25YT\tUERTY25T=LLD",
                 "@version" => "1",
                     "host" => "Elastics-MacBook-Pro.local",
                     "fNaN" => "NaN",
                   "togrok" => "OIP\t0 0.0987411235\t0..0987411238498\t1.09874112\t1.0987411299\t0\t0..098741125753\t0.0.09874112835753 -0.0.0987411218\t0..0987411223\t0..09874112624\t0..098741123\t0..09874112913\t0.907499999999595 0.0.09874112323\t0.3.098741124\t0..098741123\t0..0987411295\t1..0987411214",
    "f00589614_20170428_10" => "00581554_58956528_10"
}
4/28/2017 4:34:19 AM	True	Bed	43	00589614_20170428_10	U378ERTYT	U378ERTYT=LLD	8	OutOf	Ikea	OIP 0	0.0987411235	0..0987411238498	1.09874112	1.0987411299	0	0..098741125753	0.0.09874112835753 -0.0.0987411218	0..0987411223	0..09874112624	0..098741123	0..09874112913	0.907499999999595 1..0987411240.0.09874112323	0.3.098741124	0.6.09874112193	0..098741123	0..0987411295 1..0987411214	OutOf	U378ERTYT	NaN	0	NANA
{
                     "date" => "4/28/2017 4:34:19 AM",
                   "middle" => "OIP 0\t0.0987411235\t0..0987411238498\t1.09874112\t1.0987411299\t0\t0..098741125753\t0.0.09874112835753 -0.0.0987411218\t0..0987411223\t0..09874112624\t0..098741123\t0..09874112913\t0.907499999999595 1..098741124\t0.0.09874112323\t0.3.098741124\t0.6.09874112193\t0..098741123\t0..0987411295 1..0987411214\tOutOf\tU378ERTYT\tNaN\t0\tNANA",
                      "f43" => "43",
                       "f0" => "0",
                    "right" => "U378ERTYT\tNaN\t0\tNANA",
                  "message" => "4/28/2017 4:34:19 AM\tTrue\tBed\t43\t00589614_20170428_10\tU378ERTYT\tU378ERTYT=LLD\t8\tOutOf\tIkea\tOIP 0\t0.0987411235\t0..0987411238498\t1.09874112\t1.0987411299\t0\t0..098741125753\t0.0.09874112835753 -0.0.0987411218\t0..0987411223\t0..09874112624\t0..098741123\t0..09874112913\t0.907499999999595 1..098741124\t0.0.09874112323\t0.3.098741124\t0.6.09874112193\t0..098741123\t0..0987411295 1..0987411214\tOutOf\tU378ERTYT\tNaN\t0\tNANA",
               "fU378ERTYT" => "U378ERTYT",
                    "fNANA" => "NANA",
                    "fTrue" => "True",
                       "f8" => "8",
               "@timestamp" => 2017-06-12T10:40:43.794Z,
                     "fBed" => "Bed",
           "fU378ERTYT=LLD" => "U378ERTYT=LLD",
                     "left" => "4/28/2017 4:34:19 AM\tTrue\tBed\t43\t00589614_20170428_10\tU378ERTYT\tU378ERTYT=LLD\t8",
                 "@version" => "1",
                     "host" => "Elastics-MacBook-Pro.local",
                     "fNaN" => "NaN",
                   "togrok" => "OIP 0\t0.0987411235\t0..0987411238498\t1.09874112\t1.0987411299\t0\t0..098741125753\t0.0.09874112835753 -0.0.0987411218\t0..0987411223\t0..09874112624\t0..098741123\t0..09874112913\t0.907499999999595 1..098741124\t0.0.09874112323\t0.3.098741124\t0.6.09874112193\t0..098741123\t0..0987411295 1..0987411214",
    "f00589614_20170428_10" => "00589614_20170428_10"
}

As the next dissect depends on the previous you probably want to use a conditional check for the _dissectfailure tag.

Now you have a togrok field that contains the varying field text.
You can construct a more targeted Grok pattern for this text.

I don't know what the numbers in the togrok section represent and whether the OID followed by a space character is significantly different from an OID followed by a tab character in terms of determining which pattern of fields will then follow.
Perhaps an...

if [togrok] ~= "^OID " {
  # grok or dissect pattern 1
}
else {
  # grok or dissect pattern 2
}

Hope this helps.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.