Aggregate Filter Timeout issue


(Dan At Mimas) #1

Hello,

I'm processing some historical apache logs using logstash and attempting to use the aggregate filter to identify double-clicks. In this case, a double-click is identified as the same request, from the same client IP and user-agent, occurring within 30 seconds of each other.

Here is example apache log entries that I am processing, with IP and request details anonymised:

Test logs

Here is a reduced version of my logstash config:

logstash config

And here is the output I am getting:

aggregation output

Looking at the original Apache log data, none of the events seem to occur within 30 seconds of each other. In the logstash config I am setting the aggregate filter to a 30-second timeout (this is based on the 'iso_timestamp' field, which is an ISO date created from the timestamp in the actual log entry).

Given the above information I would have expected there to be an output event for each of the log entries, and no double-clicks - however, this appears in the output:

{
	"_id" : ObjectId("5bc72b21962d740353000006"),
	"@timestamp" : ISODate("2018-10-17T12:29:21.251Z"),
	"task_id" : "0.0.0.0Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko/results?terms=test",
	"@version" : "1",
	"double_click" : true,
	"clicks" : 2,
	"iso_timestamps" : [
		{
			"iso_timestamp" : ISODate("2018-10-08T14:18:57Z")
		},
		{
			"iso_timestamp" : ISODate("2018-10-08T14:14:08Z")
		}
	]
}

Here these events seem to have been aggregated despite the timestamps being over 4 minutes apart.

I should add that i've also tried using an inactivity_timeout, which yielded similar results.

Yet for some of my other log processing, it seems to correctly identify double-clicks as defined above.

I am most likely doing something very wrong, but I can't see what it is!

Any help would be gratefully received.

Thanks


(Dan At Mimas) #2

I have possibly made some progressed, by not setting iso_timestamps in the timeout_code. Not sure why I was doing this in the first place. This seems to now yield the expected results. I just need to confirm that genuine double-clicks are still picked up.

I have updated the gists accordingly


(Dan At Mimas) #3

Ok, so weirdly the aggregation doesn't seem to consistently produce the same results.

I've run a couple of times with the same data set and configuration and received different results each time, sometimes

My only guess at this point is something to do with the number of filter workers I am using (i.e. > 1), however I tried setting the number of workers to 1 in the logstash config, and everything just seems to grind to a halt. And even if this is the case, why is the timeout value not being respected?

Help!


(Dan At Mimas) #4

It looks like it was an issue with using multiple workers.

The problem I was having was that when I reduced workers to 1 logstash seemed to grind to a halt, and wouldn't come back up.

I am currently running logstash inside a Vagrant VM, and after investigating further I discovered that while I had the vm configured for multiple cores, I needed this configuration to get it to work effectively:

vb.customize ["modifyvm", :id, "--ioapic", "on"]

After setting that, and lowering the logstash pipeline workers to 1, everything seems to now work as expected.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.