multiline {
pattern => "... [\d+-1]" # adjust pattern to be more exact
negate => true
what => "previous"
}
but I have:
line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[138-1]...
line 5: ...[3780-1] and [138-2]...
line 6: ...[139-1]...
line 7: ...[2954-1]...
That's not what I need. I want:
line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[138-1] and [138-2]...
line 5: ...[3780-1]...
line 6: ...[139-1]...
line 7: ...[2954-1]...
or
line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[3780-1]...
line 5: ...[138-1] and [138-2]...
line 6: ...[139-1]...
line 7: ...[2954-1]...
I would appreciate if someone can share any hints or his experience on this.
Ok, given your logstash configuration, I understand now your need.
Your main problem is that you have no explicit "end line marker" to flush aggregate map.
But you're lucky ! Last week, aggregate plugin has been released with new options to deal with that case !
So, here's the right configuration for your need :
grok {
match => [ "message", "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: \[%{INT:line}-%{INT:part_of_line}\] %{GREEDYDATA:ostatok}" ]
}
aggregate {
task_id => "%{line}"
code => "
map.merge!(event) if map.empty?
map['full_message'] ||= ''
map['full_message'] += event['ostatok']
"
timeout => 10
push_map_as_event_on_timeout => true
timeout_code => "event.tag('aggregated')"
}
if "aggregated" not in [tags] {
drop {}
}
Note that in your grok expression, [ and ] chars must be escaped.
And for aggregate, the main idea is that, as you have no explicit "end log line", we use 10s timeout to push aggregated map as a new logstash event in the pipeline.
Thank you a lot. It's work for me.
But after the implementation of your settings, I found a feature that was previously not obvious.
But my log is very big and forwarding from many servers, I have coincidence in line numbering. Example:
Aug 11 11:34:53 my.host1 example.com1[21872]: [198-1] 2016-08-11 11:34:53.029 MSK etc
Aug 11 11:34:53 my.host1 example.com1[21878]: [198-1] 2016-08-11 11:34:53.150 MSK etc
Aug 11 11:34:53 my.host1 example.com1[21879]: [198-1] 2016-08-11 11:34:53.515 MSK etc
Aug 11 11:34:53 my.host3 example.com3[16548]: [17198-1] 2016-08-11 11:34:53.529 MSK etc
Aug 11 11:34:53 my.host2 example.com2[19241]: [198-1] 2016-08-11 11:34:53.722 MSK etc
Aug 11 11:34:53 my.host2 example.com2[19017]: [198-1] 2016-08-11 11:34:53.873 MSK etc
Aug 11 11:34:54 my.host1 example.com1[21901]: [198-1] 2016-08-11 11:34:54.091 MSK etc
Date, time, number of rows, PID process indication is the real, the rest of the data change for abstract.
If task_id = line, lines described above are merged into a single event and it's not right.
If I use task_id = pid, I have more bad result, because in a single PID handled many lines with different numbers.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.