Hello,
I'm processing some historical apache logs using logstash and attempting to use the aggregate filter to identify double-clicks. In this case, a double-click is identified as the same request, from the same client IP and user-agent, occurring within 30 seconds of each other.
Here is example apache log entries that I am processing, with IP and request details anonymised:
Here is a reduced version of my logstash config:
And here is the output I am getting:
Looking at the original Apache log data, none of the events seem to occur within 30 seconds of each other. In the logstash config I am setting the aggregate filter to a 30-second timeout (this is based on the 'iso_timestamp' field, which is an ISO date created from the timestamp in the actual log entry).
Given the above information I would have expected there to be an output event for each of the log entries, and no double-clicks - however, this appears in the output:
{
"_id" : ObjectId("5bc72b21962d740353000006"),
"@timestamp" : ISODate("2018-10-17T12:29:21.251Z"),
"task_id" : "0.0.0.0Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko/results?terms=test",
"@version" : "1",
"double_click" : true,
"clicks" : 2,
"iso_timestamps" : [
{
"iso_timestamp" : ISODate("2018-10-08T14:18:57Z")
},
{
"iso_timestamp" : ISODate("2018-10-08T14:14:08Z")
}
]
}
Here these events seem to have been aggregated despite the timestamps being over 4 minutes apart.
I should add that i've also tried using an inactivity_timeout, which yielded similar results.
Yet for some of my other log processing, it seems to correctly identify double-clicks as defined above.
I am most likely doing something very wrong, but I can't see what it is!
Any help would be gratefully received.
Thanks