I Can't Parse My Logs with Grok or Dissect Filter

For Real! This is one of thousand reasons why Splunk in much much better and easy to use than Elastic. And that's why we can't sell this freakin tool with any customer for the hard of use.

With Splunk I parsed a freaking log in 20 minutes with the GUI that the product has I didn't have to code anything. And with the freakin logstash is so difficult to parse a log with any of the transforms tools it has.

Ok!! I'm now releax! Damm It!!

Well hello everyone!

I'm trying to parse a log with logstash with the famous transform plugin Grok or Dissect, it has been very difficult for me to understand how this Shi** works because the lack of examples in the learn docs of logstash, they only put one example in the docs and they think with that we are ok.

This is an example of the log:

2018259000076;0498;Creacion;2018-09-16 15:24:15;baseuser;Tienda;2018-09-16 15:24:15;MLPARRAF;Capturista;2018-09-16 15:43:07;
2018261000779;0531;Creacion;2018-09-18 19:30:37;baseuser;Tienda;2018-09-18 19:30:37;IRODRIGUEC;Capturista;2018-09-18 19:33:40;
2018298000344;0529;Creacion;2018-10-25 14:06:49;baseuser;Tienda;2018-10-25 14:06:49;KGARCIAF;Capturista;2018-10-25 14:12:22;METENAR;Analisis Documental;2018-10-25 14:16:11;
2018301000689;0535;Creacion;2018-10-28 14:55:34;baseuser;Tienda;2018-10-28 14:55:34;AJMARESR;Capturista;2018-10-28 14:57:48;ZVLOPEZS;Tienda;2018-10-28 14:59:42;MFLORESO01;Capturista;2018-10-28 15:37:58;CHVAZQUEZA;Tienda;2018-10-28 17:13:21;MFLORESO01;Capturista;2018-10-28 17:45:21;AYHEREDIAH;Tienda;2018-10-28 17:47:04;MFLORESO01;Capturista;2018-10-28 18:46:13;PBRITOA;Tienda;2018-10-28 18:49:13;KCOETOM;Capturista;2018-10-28 20:47:02;AJCRUZR;Tienda;2018-10-28 20:48:02;MEHERNANDR01;Capturista;2018-11-12 18:42:44;METENAR;Tienda;2018-11-14 09:13:00;SOSANCHEZN;Capturista;2018-11-18 12:27:20;ZVLOPEZS;Tienda;2018-11-18 12:31:21;
2018301000808;0535;Creacion;2018-10-28 15:35:55;baseuser;Tienda;2018-10-28 15:35:55;AJMARESR;Capturista;2018-10-28 15:38:40;JCGARCIAS;Tienda;2018-10-28 15:44:19;MFLORESO01;Capturista;2018-10-28 16:12:51;AYHEREDIAH;Tienda;2018-10-28 16:16:09;MFLORESO01;Capturista;2018-10-28 17:54:21;GJAZAMARC;Tienda;2018-10-28 17:56:51;MFLORESO01;Capturista;2018-10-28 18:14:08;ADRAMIREZG03;Tienda;2018-10-28 18:16:43;AJMARESR;Capturista;2018-11-02 19:47:43;MCVEGAS;Tienda;2018-11-02 20:16:42;MFLORESO01;Capturista;2018-11-11 18:03:04;PSALGADO;Tienda;2018-11-11 18:04:13;AJMARESR;Capturista;2018-11-20 19:37:21;JSACOSTAP;Tienda;2018-11-21 09:21:47;SOSANCHEZN;Capturista;2018-11-28 12:46:30;AMARTINEZG16;Tienda;2018-11-28 12:55:47;
2018307002054;0563;Creacion;2018-11-03 19:33:49;baseuser;Tienda;2018-11-03 19:33:49;LAESPINOSAB;Capturista;2018-11-03 19:37:36;DCARAPIAL;Analisis Documental;2018-11-03 19:41:59;AORTEGAN;Tienda;2018-11-03 20:27:43;TPROJAST;Capturista;2018-11-18 12:49:54;PBRITOA;Analisis Documental;2018-11-18 12:51:55;BASOTOA;Tienda;2018-11-19 16:25:06;

As you can see each event starts with one common identifier of 12 numbers that starts with 2018xxxxxxxx this is going to be the beginning of each event. But one of the first issues here is that the events don't have the same longitude.

So the thing that I want to do, is separate each event in a fields, like this:

2018259000076;0498;Creacion;2018-09-16 15:24:15;baseuser;Tienda;2018-09-16 15:24:15;MLPARRAF;Capturista;2018-09-16 15:43:07;

reason:2018259000076
store:0498
action1:Creacion
action_date1:2018-09-16 15:24:15
action_user1:baseuser
action2:Tienda
action_date2:2018-09-16 15:24:15
action_user2:MLPARRAF
action3:Capturista
action_date3:2018-09-16 15:43:07
action_user3:""

So in the events that are more longer I can use fields like (action13, action_date13 , action_user13).

I tried to use a regular expresion that Splunk gave me that works very fine in the debbuger tool of kibana, and in regex101.com, that extracts the field how I want, but at the moment I enter the regular expression in the logstash it didn't work.

I tried with dissect like this.

dissect {
mapping => {"message" => "%{solicitud} %{tienda} %{accion1} %{fecha_accion1} %{user_accion1} %{accion2} %{fecha_accion2} %{user_accion2} %{accion3} %{fecha_accion3} %{user_accion3} %{accion4} %{fecha_accion4} %{user_accion4} "}
}

but nothing doesn't work.

haha I tried with grok with the regular expression I mentioned to you. but the logstash doesn't recognize it.

grok {
match => {"message" => "(?<solicitud>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)(?<tienda>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion1>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion1>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion1>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion2>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion2>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion2>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion3>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion3>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion3>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion4>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion4>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion4>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion5>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion5>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion5>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion6>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion6>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion6>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion7>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion7>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion7>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion8>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion8>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion8>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion9>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion9>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion9>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion10>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion10>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion10>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion11>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion11>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion11>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion12>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion12>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion12>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion13>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion13>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion13>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion14>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion14>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion14>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion15>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion15>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion15>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion16>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion16>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion16>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion17>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion17>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion17>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion18>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion18>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion18>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion19>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion19>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion19>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion20>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion20>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion20>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion21>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion21>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion21>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion22>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion22>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion22>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion23>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion23>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<user_accion23>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<accion24>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)(?:\x3b)?(?<fecha_accion25>(?:"(?:[^\\"]|\\.)*"|(?:(?:(?!(?:\x3b)|\\|").)|(?:\\.)))*)"}
}

If you copy this regex and the sample log above, in to the Grok debugger of kibana or in regex101.com, it will work and will extract the logs in the way I want.

But I don't know why logstash doesn't recognize it.

Thanks a Lot and I hope you could be so generous and kind to help me. For real I'm very frustrated because I can't solve this.

Why Elastic has to be so difficult.

Grok and dissect are not the only filters for parsing data. In this case the csv filter might be an option as it may be easier to read and maintain:

input {
  generator {
    lines => ['2018259000076;0498;Creacion;2018-09-16 15:24:15;baseuser;Tienda;2018-09-16 15:24:15;MLPARRAF;Capturista;2018-09-16 15:43:07;',
              '2018301000689;0535;Creacion;2018-10-28 14:55:34;baseuser;Tienda;2018-10-28 14:55:34;AJMARESR;Capturista;2018-10-28 14:57:48;ZVLOPEZS;Tienda;2018-10-28 14:59:42;MFLORESO01;Capturista;2018-10-28 15:37:58;CHVAZQUEZA;Tienda;2018-10-28 17:13:21;MFLORESO01;Capturista;2018-10-28 17:45:21;AYHEREDIAH;Tienda;2018-10-28 17:47:04;MFLORESO01;Capturista;2018-10-28 18:46:13;PBRITOA;Tienda;2018-10-28 18:49:13;KCOETOM;Capturista;2018-10-28 20:47:02;AJCRUZR;Tienda;2018-10-28 20:48:02;MEHERNANDR01;Capturista;2018-11-12 18:42:44;METENAR;Tienda;2018-11-14 09:13:00;SOSANCHEZN;Capturista;2018-11-18 12:27:20;ZVLOPEZS;Tienda;2018-11-18 12:31:21;']
    count => 1
  } 
} 

filter {
  csv {
    source => "message"
    skip_empty_columns => true
    separator => ";"
    columns => ["reason", "store", "action1", "action_date1", "action_user1", "action2", "action_date2", "action_user2", "action3", "action_date3", "action_user3", "action4", "action_date4", "action_user4", "action5", "action_date5", "action_user5", "action6", "action_date6", "action_user6", "action7", "action_date7", "action_user7", "action8", "action_date8", "action_user8", "action9", "action_date9", "action_user9", "action10", "action_date10", "action_user10", "action11", "action_date11", "action_user11", "action12", "action_date12", "action_user12", "action13", "action_date13", "action_user13", "action14", "action_date14", "action_user14", "action15", "action_date15", "action_user15", "action16", "action_date16", "action_user16", "action17", "action_date17", "action_user17", "action18", "action_date18", "action_user18", "action19", "action_date19", "action_user19", "action20", "action_date20", "action_user20"]
  }
}

output {
  stdout { codec => rubydebug }
}

Having numbered columns like in your example can sometimes make the data hard to analyse and plot. If you instead wanted to break each action into a separate event you could do something like this:

input {
  generator {
    lines => ['2018259000076;0498;Creacion;2018-09-16 15:24:15;baseuser;Tienda;2018-09-16 15:24:15;MLPARRAF;Capturista;2018-09-16 15:43:07;',
              '2018301000689;0535;Creacion;2018-10-28 14:55:34;baseuser;Tienda;2018-10-28 14:55:34;AJMARESR;Capturista;2018-10-28 14:57:48;ZVLOPEZS;Tienda;2018-10-28 14:59:42;MFLORESO01;Capturista;2018-10-28 15:37:58;CHVAZQUEZA;Tienda;2018-10-28 17:13:21;MFLORESO01;Capturista;2018-10-28 17:45:21;AYHEREDIAH;Tienda;2018-10-28 17:47:04;MFLORESO01;Capturista;2018-10-28 18:46:13;PBRITOA;Tienda;2018-10-28 18:49:13;KCOETOM;Capturista;2018-10-28 20:47:02;AJCRUZR;Tienda;2018-10-28 20:48:02;MEHERNANDR01;Capturista;2018-11-12 18:42:44;METENAR;Tienda;2018-11-14 09:13:00;SOSANCHEZN;Capturista;2018-11-18 12:27:20;ZVLOPEZS;Tienda;2018-11-18 12:31:21;']
    count => 1
  } 
} 

filter {
  dissect {
    mapping => {"message" => "%{reason};%{store};%{actions}"}
  }

  mutate {
    gsub => ["actions", ";Tienda", "|Tienda", "actions", ";Creacion", "|Creacion", "actions", ";Capturista", "|Capturista"]
  }

  split {
    field => "actions"
    terminator => "|"
    remove_field => ["message"]
  }

  csv {
    source => "actions"
    separator => ";"
    columns => ["action", "action_date", "action_user"]
    remove_field => ["actions"]
  }
}

output {
  stdout { codec => rubydebug }
}
1 Like

Hello Christian!

Many thanks for your time to answer!

I will try with the CSV.

Just to mention, I'm going to use a log file, does this work the same?

Best Regards!

The generator is a nice way to provide a self-contained example. It should work fine with a different input as ling as the message field contains data in the same structure as in the example.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.