_grokparsefailure but it work on grok debugger


(bus) #1

Hello everybody, I use logstash to grok logs line.

For exemple one of these lines is :

INFO;0000;000003;* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

I use a first grok to cut in 3 field this event like that :

match => { "message" => [ "%{WORD:TYPE};%{DATA:ID1};%{NUMBER:ID2};%{GREEDYDATA:DESCRIPTION}" ] }

So now :

"description" => * /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

So I grok this line with grok debugger like that :

%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}

Result :

{
  "NOM_BATCH": [
    "WKF998MT"
  ],
  "DATE_JOURNEE_BATCH": [
    "20180621"
  ],
  "DATE_EXECUTION_BATCH": [
    "20180621"
  ],
  "HEURE_EXECUTION_BATCH": [
    "1301"
  ]
} 

It's correct. But on Kibana, I've got a _grokparsefailure on this line....

I precise : This same grok work very well for my others log lines for exemple these lines :

ACT003MT-20180606-20180607-0014.log
RLC006MT-20180509-20180511-0228.log
RLC004-20180530-20180530-2236.log
WKF999-20180422-20180422-1937-34-120.log
WKF997-20180622-20180622-2127-42-9000-03-20180622-212829.log

It's very strange, can you help me?


(Magnus Bäck) #2

Build your expression gradually. Start with the simplest possible expression, ^%{WORD:NOM_BATCH}. Does that work? Yes? Then continue adding more and more to the end of your expression until something breaks.


(bus) #3

It doesn't work cause my line doesn't start by my "NOM_BATCH" field :

* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

To test like you said, I used only :

%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}

And i've got a grokparsefailure again. Really stranger


(Magnus Bäck) #4

So your line actually just contains

* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

? What about INFO;0000;000003;?


(bus) #5

In my log file, my log line is :

INFO;0000;000003;* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

I parse this log line to get 3 field with :

match => { "message" => [ "%{WORD:TYPE};%{DATA:ID1};%{NUMBER:ID2};%{GREEDYDATA:DESCRIPTION}" ] }

This parse is OK and work correctly.

And then I parse this result with :

%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}

And this last one doesn't work properly..

Same for %{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}


(Magnus Bäck) #6

That grok expression works fine:

$ cat test.config 
input { stdin { } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => [
      "message",
      "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}"
    ]
  }
}
$ cat data 
* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 8
Pipeline main started
{
                  "message" => "* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *",
                 "@version" => "1",
               "@timestamp" => "2018-07-02T10:29:31.098Z",
                     "host" => "lnxolofon",
                "NOM_BATCH" => "WKF998MT",
       "DATE_JOURNEE_BATCH" => "20180621",
     "DATE_EXECUTION_BATCH" => "20180621",
    "HEURE_EXECUTION_BATCH" => "1301"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

Exactly what does your configuration look like?


(bus) #7

It does not shock me that it works because this expression works on all my other lines of log except it .. I do not understand very well.

My complet pipeline looks like :

input
{
  beats
  {
    port => 5044
  }
}


filter
{
  #GROK permettant de couper les lignes du fichier au format : TYPE;ID1;ID2;DESCRIPTION
  grok
  {
    match => { "message" => [ "%{WORD:TYPE};%{DATA:ID1};%{NUMBER:ID2};%{GREEDYDATA:DESCRIPTION}" ] }
  }

  #Si la description contient "/data/EDT/batchs/files/logs" (donc la ligne du fichier de log), on va récupérer 4 champs.
  if ([DESCRIPTION] =~ "\/data\/EDT\/batchs\/files\/logs\/")
  {
    #GROK pour récupérer dans cette ligne le nom du batch, sa journée, sa date d'éxecution ainsi que son heure d'éxecution
    grok
    {
      match => { "DESCRIPTION" => [ "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}" ] }
    }

        #Si le nom du batch commence par E-
        if ([NOM_BATCH] =~ /^E-/)
        {
          #Suppression des champs précedemment sauvegardés
          mutate
          {
        remove_field => [ "NOM_BATCH","DATE_JOURNEE_BATCH","DATE_EXECUTION_BATCH","HEURE_EXECUTION_BATCH"]
          }

          #GROK pour récupérer le nom du batch, sa journée, sa date d'éxecution ainsi que son heure [FORMAT DE NOM DIFFERENT DES AUTRES]
      grok
      {
        match => { "DESCRIPTION" => [ "(?<NOM_BATCH>E-[0-9A-Z]+)-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}" ] }
      }

          #Ajout d'un champ "PATH-BATCH" contenant tous les sauvegardés + ".log" [Permettant d'avoir le nom du fichier de log complet]
      mutate
      {
        add_field => { "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}.log" }
      }
        }
        #Sinon, si le nom du batch commence par "WKF99"
        else if ([NOM_BATCH] =~ /^WKF99/)
        {
      #Suppression des champs précedemment sauvegardés
          mutate
          {
        remove_field => [ "NOM_BATCH","DATE_JOURNEE_BATCH","DATE_EXECUTION_BATCH","HEURE_EXECUTION_BATCH"]
          }

      #GROK pour récupérer le nom du batch, sa journée, sa date d'éxecution ainsi que son heure [FORMAT DE NOM DIFFERENT DES AUTRES]
      grok
      {
        match => { "DESCRIPTION" => [ "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}-%{GREEDYDATA:SUITE}.log" ] }
      }

      #Ajout d'un champ "PATH-BATCH" contenant tous les sauvegardés + ".log" [Permettant d'avoir le nom du fichier de log complet]
      mutate
      {
        add_field => { "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}-%{SUITE}.log" }
      }
        }

        #Sinon, le fichier ne commence par "E-" ni par "WKF99" donc c'est un fichier "normal", on ajoute le champ "PATH_BATH"
        else
        {
      mutate
      {
        add_field => { "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}.log" }
      }
        }

        #Sauvegarde de toutes les données précédemment sauvegardées dans des variables de classe de portée globale.
    ruby { code => "@@save_the_name = event.get('NOM_BATCH')" }
        ruby { code => "@@save_the_datejournee = event.get('DATE_JOURNEE_BATCH')" }
    ruby { code => "@@save_the_dateexecution = event.get('DATE_EXECUTION_BATCH')" }
        ruby { code => "@@save_the_hourexecution = event.get('HEURE_EXECUTION_BATCH')" }
    ruby { code => "@@save_the_path = event.get('PATH_BATCH')" }
  }
  #Sinon, la ligne ne contient pas d'informations à sauvegarder, alors on lui greffe les informations sauvegardées au-dessus.
  else
  {
    ruby { code => "event.set('NOM_BATCH', @@save_the_name)" }
    ruby { code => "event.set('DATE_JOURNEE_BATCH', @@save_the_datejournee)" }
        ruby { code => "event.set('DATE_EXECUTION_BATCH', @@save_the_dateexecution)" }
        ruby { code => "event.set('HEURE_EXECUTION_BATCH', @@save_the_hourexecution)" }
    ruby { code => "event.set('PATH_BATCH', @@save_the_path)" }
  }

  #Si la description contient "TEMPS D'ÉXECUTION", on récupère ce temps
  if ([DESCRIPTION] =~ "TEMPS D'EXECUTION")
  {
         #GROK pour récuperer le temps
    grok
    {
      match => { "DESCRIPTION" => [": %{GREEDYDATA:DUREE}"] }
    }

        #Sauvegarde de ce temps dans une variable de classe
        ruby { code => "@@save_the_duree = event.get('DUREE')" }
  }

  #Si la description contient "CODE RETOUR", on récupère ce code retour
  if ([DESCRIPTION] =~ "CODE RETOUR")
  {
        #GROK pour récuperer le code retour
    grok
    {
      match => { "DESCRIPTION" => [ "%{NUMBER:CODE_RETOUR:int}" ] }
    }

        #Sauvegarde de ce code dans une variable de classe
    ruby { code => "event.set('DUREE', @@save_the_duree)" }
  }

  #Si la description contient une succession de 20 *, drop ligne complète (Lignes du bandeau de fichier)
  if [DESCRIPTION] =~ "\*{20,}"
  {
    drop { }
  }

  #Si la description contient "LOG D'EXECUTION", drop ligne complète (Ligne du bandeau de fichier)
  if [DESCRIPTION] =~ "LOG D'EXECUTION"
  {
    drop { }
  }

  #Si batch WKF999 OU batch WKF997 on garde seulement quelques lignes :
  #Les lignes de TYPE info différente de 00001 (pour garder le bandeau)
  #et dont la description est différente de "info BDD","CODE RETOUR","TEMPS D'EXECUTION"
  #Le reste est jeté
  if ([NOM_BATCH] == "WKF999" or [NOM_BATCH] == "WKF997")
  {
    if ([TYPE] == "INFO")
    {
      if ([ID2] != "000001")
      {
        if ([DESCRIPTION] !~ "info BDD" and [DESCRIPTION] !~ "CODE RETOUR" and [DESCRIPTION] !~ "TEMPS D'EXECUTION")
            {
              drop { }
            }
      }
    }
  }
  #Sinon, le fichier batch est bien construit, on garde la ligne de path batch, sa description fonctionnelle, et le reste comme au-dessus
  else
  {
    if ([TYPE] == "INFO")
    {
      if ([ID2] != "000003" and [ID2] != "000005")
      {
        if ([DESCRIPTION] !~ "info BDD" and [DESCRIPTION] !~ "CODE RETOUR" and [DESCRIPTION] !~ "TEMPS D'EXECUTION")
        {
          drop { }
        }
      }
    }
  }
  date
  {
    match => [ "DATE_JOURNEE_BATCH", "yyyyMMdd" ]
  }
}


output
{
  elasticsearch
  {
    hosts => "http://localhost:9200"
    index => "loupe"
  }
  stdout { codec => rubydebug }
}

(Magnus Bäck) #8

And what does the stdout { codec => rubydebug } output produce for a sample message?


(bus) #9

A java error that runs in a loop prevents me from having rubydebug output :confused:


(Magnus Bäck) #10

Okay, that sounds like a known bug in a recent Logstash release. What if you use a json_lines codec instead of rubydebug?


(bus) #11

Sometime it work.. And I just try and it work (java error disappear), so with another log I have :

{
                 "@version" => "1",
                   "source" => "C:\\Users\\busy\\Desktop\\titi\\trie-1153628000.log",
                     "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
               "@timestamp" => 2018-06-28T22:00:00.000Z,
                  "message" => "INFO;0000;000003;* /data/EDT/batchs/files/logs/ACT003MT-20180606-20180607-0014.log *",
                     "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
                      "ID2" => "000003",
              "DESCRIPTION" => "* /data/EDT/batchs/files/logs/ACT003MT-20180606-20180607-0014.log *",
                "NOM_BATCH" => "ACT003MT",
               "prospector" => {
        "type" => "log"
    },
                      "ID1" => "0000",
     "DATE_EXECUTION_BATCH" => "20180606",
                   "offset" => 13446,
                     "TYPE" => "INFO",
       "DATE_JOURNEE_BATCH" => "20180606",
    "HEURE_EXECUTION_BATCH" => "0014",
               "PATH_BATCH" => "ACT003MT-20180606-20180607-0014.log",
                     "host" => "PC2609",
                 "filetype" => "VegaProdUEM"
}

It work fine. Only with WKF998MT it doesn"t work... I don't understand nothing ^^


(Magnus Bäck) #12

I want to know what an event that doesn't work looks like.


(bus) #13

Ok this one doesn't work :

{
        "message" => "INFO;0000;000003;*                        /data/EDT/batchs/files/logs/WKF998MT-20180427                                                                                -20180428-0122.log                         *",
           "host" => "PC2609",
     "prospector" => {
        "type" => "log"
    },
         "offset" => 396,
       "filetype" => "VegaProdUEM",
    "DESCRIPTION" => "*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.lo                                                                                g                         *",
           "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "_grokparsefailure"
    ],
     "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}-%{S                                                                                UITE}.log",
           "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
            "ID2" => "000003",
           "TYPE" => "INFO",
            "ID1" => "0000",
         "source" => "C:\\Users\\busy\\Desktop\\titi\\WKF998MT-20180427-20180428-0122.log",
       "@version" => "1",
     "@timestamp" => 2018-07-02T12:07:51.273Z
}

And the same line than above : (code retour event)

{
                  "message" => "INFO;0000;000513;CODE RETOUR : 0",
                     "host" => "PC2609",
               "prospector" => {
        "type" => "log"
    },
                   "offset" => 1652,
                 "filetype" => "VegaProdUEM",
              "DESCRIPTION" => "CODE RETOUR : 0",
    "HEURE_EXECUTION_BATCH" => nil,
                     "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
               "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_                                                                                BATCH}-%{SUITE}.log",
                     "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
              "CODE_RETOUR" => 0,
     "DATE_EXECUTION_BATCH" => nil,
                      "ID2" => "000513",
                    "DUREE" => "0h 0' 21\" 842ms",
                     "TYPE" => "INFO",
                      "ID1" => "0000",
                "NOM_BATCH" => nil,
       "DATE_JOURNEE_BATCH" => nil,
                   "source" => "C:\\Users\\busy\\Desktop\\titi\\WKF998MT-20180427-20180428-0122.log",
                 "@version" => "1",
               "@timestamp" => 2018-07-02T12:07:51.273Z
}

(Magnus Bäck) #14

But the line

INFO;0000;000003;*                        /data/EDT/batchs/files/logs/WKF998MT-20180427                                                                                -20180428-0122.log                         *

contains lots of whitespace in the middle of the string. If this is something you're fine with you'll have to adjust the grok expression accordingly.


(bus) #15

It's a format display error, no whitespace are presents in the real event line


(Magnus Bäck) #16

So please fix the display problem and post a failing message as it really looks.


(bus) #17
{
        "message" => "INFO;0000;000003;*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *",
           "host" => "PC2609",
     "prospector" => {
        "type" => "log"
    },
         "offset" => 396,
       "filetype" => "VegaProdUEM",
    "DESCRIPTION" => "*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *",
           "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "_grokparsefailure"
    ],
     "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}-%{SUITE}.log",
           "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
            "ID2" => "000003",
           "TYPE" => "INFO",
            "ID1" => "0000",
         "source" => "C:\\Users\\busy\\Desktop\\titi\\WKF998MT-20180427-20180428-0122.log",
       "@version" => "1",
     "@timestamp" => 2018-07-02T12:07:51.273Z
}

(Magnus Bäck) #18

Works fine here:

$ cat test.config 
input { stdin { } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => [
      "message",
      "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}"
    ]
  }
}
$ cat data 
*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 8
Pipeline main started
{
                  "message" => "*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *",
                 "@version" => "1",
               "@timestamp" => "2018-07-02T14:07:42.149Z",
                     "host" => "lnxolofon",
                "NOM_BATCH" => "WKF998MT",
       "DATE_JOURNEE_BATCH" => "20180427",
     "DATE_EXECUTION_BATCH" => "20180428",
    "HEURE_EXECUTION_BATCH" => "0122"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

If you do exactly the same thing as above on your setup, what happens? If that works maybe there's some other filter that you have that's creating the problems?


(bus) #19

It work..

Ok so I don't know why it doesn't work on my "big" filter..

EDIT : A grok in a if condition was wrong.. It's good now.

Thx for help @magnusbaeck


(system) #20

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.