Specific GROK filter for multi-line Postgresql log

111126 · July 25, 2016, 10:59am

Hi All,

I am having hard times trying to merge multi-line Postgresql logs with GROK into one Logstash event.

My log file look like this:

Jul 22 17:03:27 my.host example.com[24977]: [137-1] 2016-07-22 17:03:27.339 MSK  User: username  Database: my_db  Host: 192.168.0.52(38494)  Proc ID: 24977  etc1
Jul 22 17:03:27 my.host example.com[24977]: [137-2] 2016-07-22 17:03:27.339 MSK  User: username  Database: my_db  Host: 192.168.0.52(38494)  Proc ID: 24977  etc2
Jul 22 17:03:27 my.host example.com[24597]: [2953-1] 2016-07-22 17:03:27.339 MSK  User: username  Database: my_db  Host: 192.168.0.52(38053)  Proc ID: 24597  etc
Jul 22 17:03:27 my.host example.com[3637]: [3779-1] 2016-07-22 17:03:27.340 MSK  User: username  Database: my_db  Host: 192.168.0.52(17809)  Proc ID: 3637  etc
Jul 22 17:03:27 my.host example.com[24977]: [138-1] 2016-07-22 17:03:27.339 MSK  User: username  Database: my_db  Host: 192.168.0.52(38494)  Proc ID: 24977  etc1
Jul 22 17:03:27 my.host example.com[3637]: [3780-1] 2016-07-22 17:03:27.340 MSK  User: username  Database: my_db  Host: 192.168.0.52(17809)  Proc ID: 3637  etc
Jul 22 17:03:27 my.host example.com[24977]: [138-2] 2016-07-22 17:03:27.339 MSK  User: username  Database: my_db  Host: 192.168.0.52(38494)  Proc ID: 24977  etc2 
Jul 22 17:03:27 my.host example.com[24977]: [139-1] 2016-07-22 17:03:27.340 MSK  User: username  Database: my_db  Host: 192.168.0.52(38494)  Proc ID: 24977  etc
Jul 22 17:03:27 my.host example.com[24597]: [2954-1] 2016-07-22 17:03:27.340 MSK  User: username  Database: my_db  Host: 192.168.0.52(38053)  Proc ID: 24597  etc

I try to use:

multiline {
pattern => "... [\d+-1]" # adjust pattern to be more exact
negate => true
what => "previous"
}

but I have:

line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[138-1]...
line 5: ...[3780-1] and [138-2]...
line 6: ...[139-1]...
line 7: ...[2954-1]...

That's not what I need. I want:

line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[138-1] and [138-2]...
line 5: ...[3780-1]...
line 6: ...[139-1]...
line 7: ...[2954-1]...

or

line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[3780-1]...    
line 5: ...[138-1] and [138-2]...
line 6: ...[139-1]...
line 7: ...[2954-1]...

I would appreciate if someone can share any hints or his experience on this.

Thanks

111126 · July 26, 2016, 7:22pm

Please give me an idea how to solve my problem.

magnusbaeck · July 30, 2016, 12:15pm

You can't reliably use the multiline codec for this since log lines could be interleaved. Perhaps the aggregate filter can help?

111126 · August 10, 2016, 8:51am

Thanks. I read about this plugin and I think it's right. But I don't know ruby for write "code" section for aggregate plugin. Can you help me?

magnusbaeck · August 10, 2016, 9:05am

Sorry, I'm not that well-versed with aggregate myself and I don't have time to dig around.

111126 · August 10, 2016, 11:26am

People, who can help me write codec option for aggregate plugin?

fbaligand · August 10, 2016, 3:31pm

Hi, aggregate plugin is the right plugin for your need.
Because your logs are interleaved.

That said, for one id, can you give an example of what you expect as result document ?

111126 · August 11, 2016, 5:56am

I expect the next:

event 1: ...[137-1] and [137-2]...
event 2: ...[2953-1]...
event 3: ...[3779-1]...
event 4: ...[138-1] and [138-2]...
event 5: ...[3780-1]...
event 6: ...[139-1]...
event 7: ...[2954-1] and [2954-2]...

Task_id in this case may be [24977] or 137.

My friend help me with ruby code and I made such a configuration:

filter {
grok {
match => [ "message", "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: [%{INT:line}-%{INT:part_of_line}] %{GREEDYDATA:ostatok}" ]
}

aggregate {
        task_id => "%{line}"
         code =>  "map['full_message'] ||= '' ; map['full_message'] += event['ostatok'] ; event['full_message'] ||= '' ; event['full_message'] = map['full_message']"
       }

}
But I have next event on elasticsearch:

event 1: ...[137-1]...
event 2: ...[137-1] and [137-2]...
event 3: ...[2953-1]...
event 4: ...[3779-1]...
event 5: ...[138-1]...
event 6: ...[138-1] and [138-2]...
event 7: ...[3780-1]...
event 8: ...[139-1]...
event 9: ...[2954-1]...
event 10: ...[2954-1] and [2954-2]...
How I can clean my events from event 1: ...[137-1]..., event 5: ...[138-1]..., event 7: ...[2954-1]... ?

fbaligand · August 11, 2016, 8:00am

@111126

Ok, given your logstash configuration, I understand now your need.
Your main problem is that you have no explicit "end line marker" to flush aggregate map.
But you're lucky ! Last week, aggregate plugin has been released with new options to deal with that case !

So, here's the right configuration for your need :

    grok {
        match => [ "message", "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: \[%{INT:line}-%{INT:part_of_line}\] %{GREEDYDATA:ostatok}" ]
    }
    
    aggregate {
        task_id => "%{line}"
        code =>  "
            map.merge!(event) if map.empty?
            map['full_message'] ||= ''
            map['full_message'] += event['ostatok']
        "
        timeout => 10
        push_map_as_event_on_timeout => true
        timeout_code => "event.tag('aggregated')"
    }

    if "aggregated" not in [tags] {
        drop {}
    }

Note that in your grok expression, [ and ] chars must be escaped.
And for aggregate, the main idea is that, as you have no explicit "end log line", we use 10s timeout to push aggregated map as a new logstash event in the pipeline.

111126 · August 11, 2016, 12:55pm

Thank you a lot. It's work for me.
But after the implementation of your settings, I found a feature that was previously not obvious.
But my log is very big and forwarding from many servers, I have coincidence in line numbering. Example:

Aug 11 11:34:53 my.host1 example.com1[21872]: [198-1] 2016-08-11 11:34:53.029 MSK etc
Aug 11 11:34:53 my.host1 example.com1[21878]: [198-1] 2016-08-11 11:34:53.150 MSK etc
Aug 11 11:34:53 my.host1 example.com1[21879]: [198-1] 2016-08-11 11:34:53.515 MSK etc
Aug 11 11:34:53 my.host3 example.com3[16548]: [17198-1] 2016-08-11 11:34:53.529 MSK etc
Aug 11 11:34:53 my.host2 example.com2[19241]: [198-1] 2016-08-11 11:34:53.722 MSK etc
Aug 11 11:34:53 my.host2 example.com2[19017]: [198-1] 2016-08-11 11:34:53.873 MSK etc
Aug 11 11:34:54 my.host1 example.com1[21901]: [198-1] 2016-08-11 11:34:54.091 MSK etc

Date, time, number of rows, PID process indication is the real, the rest of the data change for abstract.
If task_id = line, lines described above are merged into a single event and it's not right.
If I use task_id = pid, I have more bad result, because in a single PID handled many lines with different numbers.

Can I use double task_id (line and pid) ? How?

fbaligand · August 11, 2016, 1:00pm

Yes, you can !
It's easy :

task_id => "%{pid}%{line}"

111126 · August 11, 2016, 1:47pm

Oeeeah! Many Thanks.
You did me a really big help. Great!

fbaligand · August 11, 2016, 5:44pm

Nice to see I helped you !
You're welcome !

Topic		Replies	Views
GROK filter for multi-line Postgresql log Logstash	8	4224	November 4, 2022
Filter for multi-lines PostgreSQL log Logstash	1	323	April 10, 2018
Grok pattern match for multiple lines Logstash docker	3	893	May 14, 2020
Log event with multi-line Logstash	5	462	March 26, 2018
Multiline log problem Logstash	1	233	January 1, 2019

Specific GROK filter for multi-line Postgresql log

Related topics