Retaining some data across different log events

This seems to be a particularly popular topic and I got some useful examples from other forum posts, but they fail to live to my expectations in an unexpected way.

So the task at hand is - I have a collection of syslog files where along the usual activity certain events are marked like this:

DEBUG MARKER: == sanity test 0d: ....

there's no end marker, but the end of a particular test is marked as the beginning of another one.

the filter I have looks like this:

filter {
grok {
match => { "message" => "%{SYSLOGBASE} %{GREEDYDATA:syslog_message}" }

if [syslog_message] =~ /^DEBUG MARKER: == .* ===========================/ {
    dissect {
    mapping => { "syslog_message" => "DEBUG MARKER: == %{test} test %{subtest}: "}
	code => "@@subtest = event.get('subtest')
		 @@test = event.get('test') "
  } else {
      init => "@@test = 'startup'
	 @@subtest = ''"
      code => "event.set('subtest', @@subtest)
	 event.set('test', @@test) "

When I run it the subtest information is correctly parsed and that's about where the good part stops. The subtest values are then randomly assigned all around the log file as if the @@subtest variable is fully globally visible across multiple threads processing the file (and it's always a static large log file for me).
So I see say subtest 5 mark on early system boot messages where it clearly should be "startup" still or "Startup" somewhere in the middle of the file as we do a lot of processing there already.

Any ideas here? Thanks!

So after some playing around, I found that reducing number of pipeline workers to 1 seems to be helping so the issue is indeed some sort of thread unsafeness, any other practical workarounds?

Class variables are globally visible! They are shared across all ruby filters in all threads.

Generally if you doing this kind of aggregation in ruby you will need be single threaded.

I see. Thanks!

A feature might come handy to outline "this set of filters is single threaded, always run them in the same worker thread", I guess so the rest of the operations don't need to be compromised.

(I am thinking of a case where with a bunch of logs to ingest where some don't need such processing and some that do, but have multiple files so there could be like a thread per file or the like (with properly isolated local variables too of course))

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.