Best Process For Counting Lines in Multiline Event

I need a way to count lines in a multiline event using Logstash 5.1.2. I've attempted simply doing a Grok on the multiline event, but when some of these events have 15 lines, I consistently get Grok Timeouts (logstash-filter-grok-3.3.1). I've done many workarounds to simplify the Grok, yet still am receiving timeouts.

A typical message idicating a _groktimeout looks like this:

  0x7fff     0     0xa0580      0xa0580           100          if0      DA2WIBL1-31-io   10.126.144.51
  0x7fff     0     0xa0580      0xa0400           100          if0      DA2WIBL1-28-io   10.126.144.50
  0x7fff     0     0xa0580      0xa0180           100          if0      DA2EIBL1-21-io   10.126.144.44
  0x7fff     0     0xa0580      0xa0140           100          if0       DA2EIBL1-3-io   10.126.144.40
  0x7fff     0     0xa0580      0xa0100           100          if0      DA2EIBL1-27-io   10.126.144.46
  0x7fff     0     0xa0580      0xa00c0           100          if0      DA2EIBL1-24-io   10.126.144.45
  0x7fff     0     0xa0580      0xa0080           100          if0      DA2EIBL1-12-io   10.126.144.43
  0x7fff     0     0xa0580      0xa0040           100          if0      DA2EIBL1-30-io   10.126.144.47
  0x7fff     0     0xa0580      0xa0000           100          if0       DA2EIBL1-6-io   10.126.144.41
  0x7fff     0     0xa0580      0x9d480           100          if0       DA2WIBL1-3-io   10.126.144.27
  0x7fff     0     0xa0580      0x9d340           100          if0       DA2WIBL1-6-io   10.126.144.28
  0x7fff     0     0xa0580      0xa0040           100          if0      DA2EIBL1-30-io   10.126.144.47
  0x7fff     0     0xa0580      0xa0000           100          if0       DA2EIBL1-6-io   10.126.144.41
  0x7fff     0     0xa0580      0x9d480           100          if0       DA2WIBL1-3-io   10.126.144.27
  0x7fff     0     0xa0580      0x9d340           100          if0       DA2WIBL1-6-io   10.126.144.28
DA2EIBL1-24-io# spawn ssh admin@10.126.64.232

Is there a way to simply count the number of lines in this multiline event? I need to know the number of lines which start with '0x7faa', which is not consistently 15, although never larger. Otherwise if I'm able to grok these lines, what would be the most efficient patterns to use? I'll share some of the patterns I've attempted below.

Or does anyone have a better architecture idea? I can't do much to modify the input data but am open to suggestions.

In all these patterns, I counted the number of '100's present, which would be representative of the number of lines (e.g. 12 lines sums to 1200). Patterns I've attempted;

Matching all fields which are variable, one pattern for every potential number of lines (e.g. cluster15 for 15 lines, cluster14 for 14 lines etc..):

cluster15 ^  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri1:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri2:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri3:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri4:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri5:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri6:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri7:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri8:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri9:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri10:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri11:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri12:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri13:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri14:float}          if0%{DATA:switch_ip}\n
  0x7fff     0     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri15:float}          if0%{DATA:switch_ip}\n
.*$

Matching only the '100's, so I can sum them up to get the number of lines, one pattern for every potential number of lines:

cluster15 ^  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri1:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri2:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri3:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri4:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri5:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri6:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri7:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri8:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri9:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri10:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri11:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri12:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri13:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri14:float}          if0.*\n
  0x7fff     0     0x.*      0x.*           %{BASE10NUM:gateway_pri15:float}          if0.*\n
.*$

Attempting to use one pattern to match any number of lines (tried with and without the optional spacing):

cluster ^  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri1:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri2:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri3:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri4:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri5:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri6:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri7:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri8:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri9:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri10:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri11:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri12:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri13:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri14:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip}\n)?
(  0x7fff     %{BASE10NUM:vlan}     %{DATA:master_sn}      %{DATA:gateway_sn}           %{BASE10NUM:gateway_pri15:float}          %{WORD:if_name}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch}   ( )?( )?( )?( )?( )?( )?( )?%{DATA:switch_ip})?\n
DA%{GREEDYDATA:throwaway}$

None of these patterns work for more than a minute or two before consistently timing out on Grok. What's even more peculiar is that when I restart Logstash, the multiline codec starts reading from the beginning of the file it's tailing -- all multiline events appear to parse correctly until Logstash gets to the end of the file. At that point is when all subsequent Multiline events fail parsing.

Input and filter config for the sake of completeness:

input {
  file {
    path => "/var/log/cluster/cluster.log"
#    start_position => beginning
    codec => multiline {
      pattern => "0x7fff"
      what => "next"
      auto_flush_interval => 5
    }
  }
}

filter {
  if "multiline" in [tags] {
    grok {
      patterns_dir => ["/opt/logstash/patterns/cluster"]
      match => { "message" => ["%{cluster15}","%{cluster14}","%{cluster13}","%{cluster12}","%{cluster11}","%{cluster10}","%{cluster9}","%{cluster8}","%{cluster7}","%{cluster6}","%{cluster5}","%{cluster4}","%{cluster3}","%{cluster2}","%{cluster1}" ] }
      add_field => {"multiline_message" => "uninitialized"}
      tag_on_failure => ["_clusterparsefailure"]
    }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.