Looking for idea to preprocess logs

Hello guys,

In my log file, multiple threads are logged in the same file and this causes a badly ordered file which looks like :

2018-12-13T11:46:13.654+0000 Regulatory [INFO] Transaction etc...
2018-12-13T13:13:22.449+0000 Regulatory [INFO] Transaction etc...
2018-12-13T12:07:41.644+0000 Regulatory [INFO] Transaction etc....
2018-12-13T11:41:44.846+0000 Regulatory [INFO] Transaction etc....   

Is there any way to sort it by timestamp? I am not a kafka expert but I'm wondering if it could be the righ tool to achieve this.

Did someone have any idea to make it work? my final goal is to start process this data with logstash in the right order :

2018-12-13T11:41:44.846+0000 Regulatory [INFO] Transaction ...........
2018-12-13T11:46:13.654+0000 Regulatory [INFO] Transaction ...........
2018-12-13T12:07:41.644+0000 Regulatory [INFO] Transaction ...........
2018-12-13T13:13:22.449+0000 Regulatory [INFO] Transaction ...........

Thank you in advance.

Instead of pre-order the data, use the timestamp from the event and set that as the timestamp to index in elasticsearch. Then it nicely sorted in kibana when you view the data..

You can use a filter like this (this does not match your timestamp, it is just an example).

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:replace_timestamp}" }
    date {
      match => ['replace_timestamp', 'yyyy-MM-dd HH:mm:ss']
      timezone => "UTC"
      target => "@timestamp"

Thank you @pjanzen. It's a good idea, but I'm using a multiline codec in my input :

  file {
    path => "/usr/share/logstash/test.log"
    start_position => "beginning"
    type => "log"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "(null)+"
      what => "previous"

So, I am looking for a way to pre-order data before the multiline codec.

Your using the multiline codec to create 1 single event before you are processing it right? the it still would work I think.. But I can be wrong of course, I could miss information..

Ok, I will explain you my use case. In my log file, I have two kind of lines :

2018-12-13T11:41:44.846+0000 Regulatory [INFO] Transaction : VALIDATE,qf16ft787bif1xs1iuoqihwi9,00000000,100002506,13-12-2018,13-12-2018T11:41:42.447+0000,null,Payment Order,Date not a working day
2018-12-13T13:54:43.646+0000 Regulatory [INFO] Transaction : PROCESS,007069643021v2xs7x08f975bswzkiv5nona,0070696430.2,100002506,13-12-2018,13-12-2018T13:54:42.585+0000,13-12-2018T13:54:43.139+0000,Payment Order,None

As you can see, there is one line with a null value and the other one with a date value. This date could be null. So, I am using the multiline codec to create 1 single event for the null values.

My goal is to create single events for null consecutive values that's why I am using the multiline codec. But because of the multiple threads logged in the same file I need to pre-order data by timestamps firstly to respect the rule of consecutive values.

I hope I was clear enough because my english is so bad :smiley:

Thank you @pjanzen

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.