When processing stdin, any equivalents to awk's BEGIN and END?

awk has the special actions triggered by the BEGINning and ENDing the processing of stdin (or each individual input file).

(There are also variables automatically maintained by the tool (like NR).)

Is there any equivalent in Logstash? For example, when working on filters, it is very useful to see the throughput -- events per second -- but using external tools (like time(1)) is imprecise, because logstash is very slow to startup.

Can I somehow start a counter, when the first event is read from stdin and finish it, when the processing ends?

I suppose, I can use ruby-filter's init-clause to capture the beginning, but how would I report the results at the end -- without reporting them for every event?

This is a great question! I think the best way is to use a config like:

 bin/logstash -b 500 -w 1 -e 'input { tcp { port => 1234 } } filter { my_filter {} }' > /dev/null

then, to benchmark it, after logstash is booted, in another terminal run:

time nc localhost 1234 < lots_of_test_data.log

The idea here is that by using the in-memory queue there won't be much buffering, so for a reasonably sized amount of test data the output of time will only be off by whatever logstash has buffered in memory, which is only the value of -b. Alternatively you could do use pv to measure the rate, though keep in mind you need its -a flag to average the rate over the full run, by default its rate is a sliding window IIRC.

Yes, that would work too, I suppose... But back to my original question, a ruby-filter's init-clause is the equivalent to awk's BEGIN, right? And there is no equivalent to awk's END, is there?

BTW, my current performance-meter is thus:

filter {
	ruby {
		init => '
			$lines = 0
			StartedAt = Time.now.to_f
		'
		code => '
			$lines += 1
			if $lines % 100 == 0
				spent = Time.now.to_f - StartedAt
				if spent > 0	# Just in case, avoid dividing by zero
					printf("\r%d lines at %.2f lines per second\t", $lines, $lines / spent)
				end
			end
		'
	}
}

This is neat and pretty, when running Logstash interactively, but it can not output the final summary -- because there is no END-equivalent... Perhaps, there should be?

Logstash is intended to be a processing system for an endless stream of data, so in many ways there is no "beginning" and there is no "end".

We made a special case for a few plugins (stdin, for example) that can have an "end" but there is no "end" event from stdin. You could fork or patch the plugin to add this capability if you need it.

I'd say, stdin and file inputs could generate a BEGIN and END events every time they begin and end processing... Maybe, I will create a pull-request... Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.