Hello everyone.
I am again here. I would like to start thanking every single one one of you for this amazing community.
Those days I was testing elasticsearch and logstash and I am just falling in love with how many things I can achieve with those 2 tools, it is impressive.
In my testing environment using elasticsearch, logstash and grafana, I am trying to aggregate similar fields in a specific time range to save in disk space and optimise data visualisation. To explain myself better ill will give an example.
Currently I have some junk syslog generated by kiwi syslog generator. The fields are
timestamp
message
what I want to do, is if I have 2 identical messages generated in the last 10min, to group them in 1 line and add a column count that reflects how many time that message occurred in the last 10min.
example:
before:
timestamp. message
13:54:24. hello
13:54:35. hello
after:
timestamp. message. count
13.54.35. hello. 2
I checked the documentation and I see logstash offers the aggregate
filter plugin, but I was wondering if there is an option to specify a timespan value in which those event occurs.
Thank you very much for your time
EDIT:
I went through the documentation to implement the timeout aggregation as follow:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "count_message"]
}
mutate {
add_field => {"newfield" => "%{@timestamp}%{message}"}
}
if [message] =~ "MESSAGE" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
The output is similar to what I am expecting but not 100% correct.
The actual output, duplicate every rows, adding the a tags _aggregation
to it.
example:
if I have those 3 logs:
timestamp. message
13:54:24. MESSAGE
13:54:35. MESSAGE
13:54:40. ESSAGE
as a result I am getting
timestamp. message. tags
13:55:24. MESSAGE. _aggregationtimeout
13:55:24. MESSAGE. _aggregationtimeout
13:55:24. MESSAGE. _aggregationtimeout
13:54:24. MESSAGE.
13:54:35. MESSAGE
13:54:40. MESSAGE
Can please anyone help to understand how I can get the count of duplicate events in a specific time range?