When processing stdin, any equivalents to awk's BEGIN and END?

UnitedMarsupials · July 18, 2017, 4:20pm

awk has the special actions triggered by the BEGINning and ENDing the processing of stdin (or each individual input file).

(There are also variables automatically maintained by the tool (like NR).)

Is there any equivalent in Logstash? For example, when working on filters, it is very useful to see the throughput -- events per second -- but using external tools (like time(1)) is imprecise, because logstash is very slow to startup.

Can I somehow start a counter, when the first event is read from stdin and finish it, when the processing ends?

I suppose, I can use ruby-filter's init-clause to capture the beginning, but how would I report the results at the end -- without reporting them for every event?

Andrew_Cholakian1 · July 21, 2017, 3:56pm

This is a great question! I think the best way is to use a config like:

 bin/logstash -b 500 -w 1 -e 'input { tcp { port => 1234 } } filter { my_filter {} }' > /dev/null

then, to benchmark it, after logstash is booted, in another terminal run:

time nc localhost 1234 < lots_of_test_data.log

The idea here is that by using the in-memory queue there won't be much buffering, so for a reasonably sized amount of test data the output of time will only be off by whatever logstash has buffered in memory, which is only the value of -b. Alternatively you could do use pv to measure the rate, though keep in mind you need its -a flag to average the rate over the full run, by default its rate is a sliding window IIRC.

UnitedMarsupials · July 21, 2017, 4:14pm

Yes, that would work too, I suppose... But back to my original question, a ruby-filter's init-clause is the equivalent to awk's BEGIN, right? And there is no equivalent to awk's END, is there?

BTW, my current performance-meter is thus:

filter {
	ruby {
		init => '
			$lines = 0
			StartedAt = Time.now.to_f
		'
		code => '
			$lines += 1
			if $lines % 100 == 0
				spent = Time.now.to_f - StartedAt
				if spent > 0	# Just in case, avoid dividing by zero
					printf("\r%d lines at %.2f lines per second\t", $lines, $lines / spent)
				end
			end
		'
	}
}

This is neat and pretty, when running Logstash interactively, but it can not output the final summary -- because there is no END-equivalent... Perhaps, there should be?

jordansissel · July 21, 2017, 8:28pm

Logstash is intended to be a processing system for an endless stream of data, so in many ways there is no "beginning" and there is no "end".

We made a special case for a few plugins (stdin, for example) that can have an "end" but there is no "end" event from stdin. You could fork or patch the plugin to add this capability if you need it.

UnitedMarsupials · July 24, 2017, 3:03pm

I'd say, stdin and file inputs could generate a BEGIN and END events every time they begin and end processing... Maybe, I will create a pull-request... Thanks!

system · August 21, 2017, 3:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash to log file processing start and end time Logstash	3	648	December 14, 2020
How to know that logstash has finished processing all its input Logstash	2	2118	July 30, 2019
How to get step by steps results as logstash processes the conf file Logstash	5	764	December 24, 2017
Process downloaded log files Logstash	11	1459	June 20, 2019
Stop logstash after processing the file Logstash	4	10967	June 6, 2017

When processing stdin, any equivalents to awk's BEGIN and END?

Related Topics