_grokparsefailure but it work on grok debugger

Hello everybody, I use logstash to grok logs line.

For exemple one of these lines is :

INFO;0000;000003;* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

I use a first grok to cut in 3 field this event like that :

match => { "message" => [ "%{WORD:TYPE};%{DATA:ID1};%{NUMBER:ID2};%{GREEDYDATA:DESCRIPTION}" ] }

So now :

"description" => * /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

So I grok this line with grok debugger like that :

%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}

Result :

{
  "NOM_BATCH": [
    "WKF998MT"
  ],
  "DATE_JOURNEE_BATCH": [
    "20180621"
  ],
  "DATE_EXECUTION_BATCH": [
    "20180621"
  ],
  "HEURE_EXECUTION_BATCH": [
    "1301"
  ]
} 

It's correct. But on Kibana, I've got a _grokparsefailure on this line....

I precise : This same grok work very well for my others log lines for exemple these lines :

ACT003MT-20180606-20180607-0014.log
RLC006MT-20180509-20180511-0228.log
RLC004-20180530-20180530-2236.log
WKF999-20180422-20180422-1937-34-120.log
WKF997-20180622-20180622-2127-42-9000-03-20180622-212829.log

It's very strange, can you help me?

Build your expression gradually. Start with the simplest possible expression, ^%{WORD:NOM_BATCH}. Does that work? Yes? Then continue adding more and more to the end of your expression until something breaks.

It doesn't work cause my line doesn't start by my "NOM_BATCH" field :

* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

To test like you said, I used only :

%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}

And i've got a grokparsefailure again. Really stranger

So your line actually just contains

* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

? What about INFO;0000;000003;?

In my log file, my log line is :

INFO;0000;000003;* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *

I parse this log line to get 3 field with :

match => { "message" => [ "%{WORD:TYPE};%{DATA:ID1};%{NUMBER:ID2};%{GREEDYDATA:DESCRIPTION}" ] }

This parse is OK and work correctly.

And then I parse this result with :

%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}

And this last one doesn't work properly..

Same for %{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}

That grok expression works fine:

$ cat test.config 
input { stdin { } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => [
      "message",
      "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}"
    ]
  }
}
$ cat data 
* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 8
Pipeline main started
{
                  "message" => "* /data/EDT/batchs/files/logs/WKF998MT-20180621-20180621-1301.log *",
                 "@version" => "1",
               "@timestamp" => "2018-07-02T10:29:31.098Z",
                     "host" => "lnxolofon",
                "NOM_BATCH" => "WKF998MT",
       "DATE_JOURNEE_BATCH" => "20180621",
     "DATE_EXECUTION_BATCH" => "20180621",
    "HEURE_EXECUTION_BATCH" => "1301"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

Exactly what does your configuration look like?

It does not shock me that it works because this expression works on all my other lines of log except it .. I do not understand very well.

My complet pipeline looks like :

input
{
  beats
  {
    port => 5044
  }
}


filter
{
  #GROK permettant de couper les lignes du fichier au format : TYPE;ID1;ID2;DESCRIPTION
  grok
  {
    match => { "message" => [ "%{WORD:TYPE};%{DATA:ID1};%{NUMBER:ID2};%{GREEDYDATA:DESCRIPTION}" ] }
  }

  #Si la description contient "/data/EDT/batchs/files/logs" (donc la ligne du fichier de log), on va récupérer 4 champs.
  if ([DESCRIPTION] =~ "\/data\/EDT\/batchs\/files\/logs\/")
  {
    #GROK pour récupérer dans cette ligne le nom du batch, sa journée, sa date d'éxecution ainsi que son heure d'éxecution
    grok
    {
      match => { "DESCRIPTION" => [ "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}" ] }
    }

        #Si le nom du batch commence par E-
        if ([NOM_BATCH] =~ /^E-/)
        {
          #Suppression des champs précedemment sauvegardés
          mutate
          {
        remove_field => [ "NOM_BATCH","DATE_JOURNEE_BATCH","DATE_EXECUTION_BATCH","HEURE_EXECUTION_BATCH"]
          }

          #GROK pour récupérer le nom du batch, sa journée, sa date d'éxecution ainsi que son heure [FORMAT DE NOM DIFFERENT DES AUTRES]
      grok
      {
        match => { "DESCRIPTION" => [ "(?<NOM_BATCH>E-[0-9A-Z]+)-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}" ] }
      }

          #Ajout d'un champ "PATH-BATCH" contenant tous les sauvegardés + ".log" [Permettant d'avoir le nom du fichier de log complet]
      mutate
      {
        add_field => { "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}.log" }
      }
        }
        #Sinon, si le nom du batch commence par "WKF99"
        else if ([NOM_BATCH] =~ /^WKF99/)
        {
      #Suppression des champs précedemment sauvegardés
          mutate
          {
        remove_field => [ "NOM_BATCH","DATE_JOURNEE_BATCH","DATE_EXECUTION_BATCH","HEURE_EXECUTION_BATCH"]
          }

      #GROK pour récupérer le nom du batch, sa journée, sa date d'éxecution ainsi que son heure [FORMAT DE NOM DIFFERENT DES AUTRES]
      grok
      {
        match => { "DESCRIPTION" => [ "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}-%{GREEDYDATA:SUITE}.log" ] }
      }

      #Ajout d'un champ "PATH-BATCH" contenant tous les sauvegardés + ".log" [Permettant d'avoir le nom du fichier de log complet]
      mutate
      {
        add_field => { "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}-%{SUITE}.log" }
      }
        }

        #Sinon, le fichier ne commence par "E-" ni par "WKF99" donc c'est un fichier "normal", on ajoute le champ "PATH_BATH"
        else
        {
      mutate
      {
        add_field => { "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}.log" }
      }
        }

        #Sauvegarde de toutes les données précédemment sauvegardées dans des variables de classe de portée globale.
    ruby { code => "@@save_the_name = event.get('NOM_BATCH')" }
        ruby { code => "@@save_the_datejournee = event.get('DATE_JOURNEE_BATCH')" }
    ruby { code => "@@save_the_dateexecution = event.get('DATE_EXECUTION_BATCH')" }
        ruby { code => "@@save_the_hourexecution = event.get('HEURE_EXECUTION_BATCH')" }
    ruby { code => "@@save_the_path = event.get('PATH_BATCH')" }
  }
  #Sinon, la ligne ne contient pas d'informations à sauvegarder, alors on lui greffe les informations sauvegardées au-dessus.
  else
  {
    ruby { code => "event.set('NOM_BATCH', @@save_the_name)" }
    ruby { code => "event.set('DATE_JOURNEE_BATCH', @@save_the_datejournee)" }
        ruby { code => "event.set('DATE_EXECUTION_BATCH', @@save_the_dateexecution)" }
        ruby { code => "event.set('HEURE_EXECUTION_BATCH', @@save_the_hourexecution)" }
    ruby { code => "event.set('PATH_BATCH', @@save_the_path)" }
  }

  #Si la description contient "TEMPS D'ÉXECUTION", on récupère ce temps
  if ([DESCRIPTION] =~ "TEMPS D'EXECUTION")
  {
         #GROK pour récuperer le temps
    grok
    {
      match => { "DESCRIPTION" => [": %{GREEDYDATA:DUREE}"] }
    }

        #Sauvegarde de ce temps dans une variable de classe
        ruby { code => "@@save_the_duree = event.get('DUREE')" }
  }

  #Si la description contient "CODE RETOUR", on récupère ce code retour
  if ([DESCRIPTION] =~ "CODE RETOUR")
  {
        #GROK pour récuperer le code retour
    grok
    {
      match => { "DESCRIPTION" => [ "%{NUMBER:CODE_RETOUR:int}" ] }
    }

        #Sauvegarde de ce code dans une variable de classe
    ruby { code => "event.set('DUREE', @@save_the_duree)" }
  }

  #Si la description contient une succession de 20 *, drop ligne complète (Lignes du bandeau de fichier)
  if [DESCRIPTION] =~ "\*{20,}"
  {
    drop { }
  }

  #Si la description contient "LOG D'EXECUTION", drop ligne complète (Ligne du bandeau de fichier)
  if [DESCRIPTION] =~ "LOG D'EXECUTION"
  {
    drop { }
  }

  #Si batch WKF999 OU batch WKF997 on garde seulement quelques lignes :
  #Les lignes de TYPE info différente de 00001 (pour garder le bandeau)
  #et dont la description est différente de "info BDD","CODE RETOUR","TEMPS D'EXECUTION"
  #Le reste est jeté
  if ([NOM_BATCH] == "WKF999" or [NOM_BATCH] == "WKF997")
  {
    if ([TYPE] == "INFO")
    {
      if ([ID2] != "000001")
      {
        if ([DESCRIPTION] !~ "info BDD" and [DESCRIPTION] !~ "CODE RETOUR" and [DESCRIPTION] !~ "TEMPS D'EXECUTION")
            {
              drop { }
            }
      }
    }
  }
  #Sinon, le fichier batch est bien construit, on garde la ligne de path batch, sa description fonctionnelle, et le reste comme au-dessus
  else
  {
    if ([TYPE] == "INFO")
    {
      if ([ID2] != "000003" and [ID2] != "000005")
      {
        if ([DESCRIPTION] !~ "info BDD" and [DESCRIPTION] !~ "CODE RETOUR" and [DESCRIPTION] !~ "TEMPS D'EXECUTION")
        {
          drop { }
        }
      }
    }
  }
  date
  {
    match => [ "DATE_JOURNEE_BATCH", "yyyyMMdd" ]
  }
}


output
{
  elasticsearch
  {
    hosts => "http://localhost:9200"
    index => "loupe"
  }
  stdout { codec => rubydebug }
}

And what does the stdout { codec => rubydebug } output produce for a sample message?

A java error that runs in a loop prevents me from having rubydebug output :confused:

Okay, that sounds like a known bug in a recent Logstash release. What if you use a json_lines codec instead of rubydebug?

Sometime it work.. And I just try and it work (java error disappear), so with another log I have :

{
                 "@version" => "1",
                   "source" => "C:\\Users\\busy\\Desktop\\titi\\trie-1153628000.log",
                     "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
               "@timestamp" => 2018-06-28T22:00:00.000Z,
                  "message" => "INFO;0000;000003;* /data/EDT/batchs/files/logs/ACT003MT-20180606-20180607-0014.log *",
                     "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
                      "ID2" => "000003",
              "DESCRIPTION" => "* /data/EDT/batchs/files/logs/ACT003MT-20180606-20180607-0014.log *",
                "NOM_BATCH" => "ACT003MT",
               "prospector" => {
        "type" => "log"
    },
                      "ID1" => "0000",
     "DATE_EXECUTION_BATCH" => "20180606",
                   "offset" => 13446,
                     "TYPE" => "INFO",
       "DATE_JOURNEE_BATCH" => "20180606",
    "HEURE_EXECUTION_BATCH" => "0014",
               "PATH_BATCH" => "ACT003MT-20180606-20180607-0014.log",
                     "host" => "PC2609",
                 "filetype" => "VegaProdUEM"
}

It work fine. Only with WKF998MT it doesn"t work... I don't understand nothing ^^

I want to know what an event that doesn't work looks like.

Ok this one doesn't work :

{
        "message" => "INFO;0000;000003;*                        /data/EDT/batchs/files/logs/WKF998MT-20180427                                                                                -20180428-0122.log                         *",
           "host" => "PC2609",
     "prospector" => {
        "type" => "log"
    },
         "offset" => 396,
       "filetype" => "VegaProdUEM",
    "DESCRIPTION" => "*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.lo                                                                                g                         *",
           "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "_grokparsefailure"
    ],
     "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}-%{S                                                                                UITE}.log",
           "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
            "ID2" => "000003",
           "TYPE" => "INFO",
            "ID1" => "0000",
         "source" => "C:\\Users\\busy\\Desktop\\titi\\WKF998MT-20180427-20180428-0122.log",
       "@version" => "1",
     "@timestamp" => 2018-07-02T12:07:51.273Z
}

And the same line than above : (code retour event)

{
                  "message" => "INFO;0000;000513;CODE RETOUR : 0",
                     "host" => "PC2609",
               "prospector" => {
        "type" => "log"
    },
                   "offset" => 1652,
                 "filetype" => "VegaProdUEM",
              "DESCRIPTION" => "CODE RETOUR : 0",
    "HEURE_EXECUTION_BATCH" => nil,
                     "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
               "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_                                                                                BATCH}-%{SUITE}.log",
                     "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
              "CODE_RETOUR" => 0,
     "DATE_EXECUTION_BATCH" => nil,
                      "ID2" => "000513",
                    "DUREE" => "0h 0' 21\" 842ms",
                     "TYPE" => "INFO",
                      "ID1" => "0000",
                "NOM_BATCH" => nil,
       "DATE_JOURNEE_BATCH" => nil,
                   "source" => "C:\\Users\\busy\\Desktop\\titi\\WKF998MT-20180427-20180428-0122.log",
                 "@version" => "1",
               "@timestamp" => 2018-07-02T12:07:51.273Z
}

But the line

INFO;0000;000003;*                        /data/EDT/batchs/files/logs/WKF998MT-20180427                                                                                -20180428-0122.log                         *

contains lots of whitespace in the middle of the string. If this is something you're fine with you'll have to adjust the grok expression accordingly.

It's a format display error, no whitespace are presents in the real event line

So please fix the display problem and post a failing message as it really looks.

{
        "message" => "INFO;0000;000003;*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *",
           "host" => "PC2609",
     "prospector" => {
        "type" => "log"
    },
         "offset" => 396,
       "filetype" => "VegaProdUEM",
    "DESCRIPTION" => "*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *",
           "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "_grokparsefailure"
    ],
     "PATH_BATCH" => "%{NOM_BATCH}-%{DATE_JOURNEE_BATCH}-%{DATE_EXECUTION_BATCH}-%{HEURE_EXECUTION_BATCH}-%{SUITE}.log",
           "beat" => {
         "version" => "6.2.4",
            "name" => "PC2609",
        "hostname" => "PC2609"
    },
            "ID2" => "000003",
           "TYPE" => "INFO",
            "ID1" => "0000",
         "source" => "C:\\Users\\busy\\Desktop\\titi\\WKF998MT-20180427-20180428-0122.log",
       "@version" => "1",
     "@timestamp" => 2018-07-02T12:07:51.273Z
}

Works fine here:

$ cat test.config 
input { stdin { } }
output { stdout { codec => rubydebug } }
filter {
  grok {
    match => [
      "message",
      "%{WORD:NOM_BATCH}-%{BASE16NUM:DATE_JOURNEE_BATCH}-%{BASE16NUM:DATE_EXECUTION_BATCH}-%{BASE16NUM:HEURE_EXECUTION_BATCH}"
    ]
  }
}
$ cat data 
*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 8
Pipeline main started
{
                  "message" => "*                        /data/EDT/batchs/files/logs/WKF998MT-20180427-20180428-0122.log                         *",
                 "@version" => "1",
               "@timestamp" => "2018-07-02T14:07:42.149Z",
                     "host" => "lnxolofon",
                "NOM_BATCH" => "WKF998MT",
       "DATE_JOURNEE_BATCH" => "20180427",
     "DATE_EXECUTION_BATCH" => "20180428",
    "HEURE_EXECUTION_BATCH" => "0122"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

If you do exactly the same thing as above on your setup, what happens? If that works maybe there's some other filter that you have that's creating the problems?

It work..

Ok so I don't know why it doesn't work on my "big" filter..

EDIT : A grok in a if condition was wrong.. It's good now.

Thx for help @magnusbaeck

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.