Handling / dropping files according to previously handled files

I'm having an issue on how to deal with a batch of files.

The lines of each file contain different pieces of information:

  • the first line acts as a header (with a recognizable prefix) and includes (among other things) a unique ID

  • the other lines (prefixed differently) must be analyzed and passed to ES IFF the unique ID in the first line was never spotted before in any other file

So far, I have used the following Ruby code to store and verify the ID:

ruby {
	init => '
		if @@alreadymet=nil
			@@alreadymet = []
		end
		@@canGo="false"
	'
	code => '
		unless @@alreadymet&.include?(event.get("product_serial"))
			@@alreadymet.push(event.get("product_serial"))
			@@canGo="true"
		end
		event.set("can_go",@@canGo)
	'
} 

However, when I further use the can_go attribute on the same first line, it is never evaluated to true:

if [can_go] =~ /^true/ { [...]

I'm quite new to Logstash and Ruby and I guess I'm doing something wrong...
Any help appreciated :slight_smile:

There is no reason to use a class variable (@@canGo, @@alreadymet). You can use instance variables (@canGo etc.)

Once @@canGo gets set to true it will never get reset to false. It's value should depend on whether @@alreadmet includes the value of product_serial.

You are making assumptions about the ordering of events that may not be true unless you set pipeline.workers to 1 and disable the java execution engine. Even then you appear to be assuming that the file input will read each file in turn, and I do not think that is guaranteed.

The normal way to write

	if @@alreadymet=nil
		@@alreadymet = []
	end

is

  @@alreadymet ||= []

But since this is in the init block it only gets called once, so you can use

@@alreadymet = []

OK, I wasn't sure about Logstash behavior (ie whether an instance corresponds to a processor or the processing of a given file)

I did set pipeline workers to 1, however I didn't know about the mentioned bug. So if I understand correctly, I should use --java-execution=false in line command?

I don't understand your remark: if there is only one pipeline worker, doesn't that mean that each file will be handled after the previous one has been dealt with?

Anyway, thanks for your reply! I will try to change to instance variables ASAP to see if it solves my issue.

That remark was not about the pipeline, it was about the input.

OK, not sure it does solve the issue you mentioned but I used the following option in the input:

file_sort_by => "path"

OK, I figured out what my error was: when analyzing the "other" lines (ie not the first) I simply forgot to initialize correctly the can_go event, duh!

I just had to add this before checking can_go :
ruby {code => 'event.set("can_go",@@canGo)'}

@Badger: by the way, I HAD to use class variables, otherwise they would not stick from file to file...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.