Keeping global variables in LS?!

Is there a way to keep variables at a global level?!

Here is an excerpt from the log file that we are processing. As you can see there is data about CustomerAccountNumber, and ContractNumber. The trouble is that the key piece of data that we need to use to tie all this data together is only available on one of the log lines above (not shown)....

So this begs a question: is there a way to preserve some key piece of data from earlier lines somewhere inside the pipeline so we could refer to it later when we have the Customer Account data and Contract number, etc?!

The @metadata comes close, but as far as I know that's available per Event, so that doesn't help me here.

Customer Account Number=233322    Contract Number=CFTFGXR    Work Number=W336C    Status=Billing Failure
    Billing: ID=68183    Status=Billing Failure    Billing Date=2016-01-28    Billing Amount=3494.60    Invoice#7328807    BCC=1B13
        CF11130E=Incident billing dates not within work number start and end dates 
        CF11146E=Incident charge bill thru date is later than work number end date

I'm guessing that's not possible...

Well you didn't really wait too long for an answer :slight_smile:

So you are treating each of these lines as unique events?

Not sure how to answer your question about "unique events" - they are unique, yet they are all related to an Invoice in a Request. So the hierarchy of the data is something like this:

These separate lines in the log are all related to a specific RequestId that is listed only once somewhere in the log above. By the time I read the customerAccountNumber, contractNumber, and workNumber I no longer have 'visibility' or knowledge of the RequestId from above...

Customer Account Number=233322 Contract Number=CFTFGXR Work Number=W336C Status=Billing Failure
Billing: ID=68183 Status=Billing Failure Billing Date=2016-01-28 Billing Amount=3494.60 Invoice#7328807 BCC=1B13

RequestId > has many Invoices, and Invoice has customerAccount#, Contract#, Work#, Billing#, Status, BillingDate, Amount, Invoice#, BCC

There are multiple RequestIds in the same log, each of them having numerous Invoices under it.

Was thinking - if I come across the RequestId and have a way to preserve it across multiple Events, then I can relate all subsequent invoice details to that same RequestId until it changes. When it changes, I would then mark all details that follow to that new RequestId, and so on to the end.

Hope that makes it more clear.

I use an approach like this to carry the year forward from one event to all the rest until I get a new year, grouped by input file:

        if ([message] =~ /^started/) {
            ruby{
                init => "@@map = {}"
                code => "@@map[event['path']] = Time.at(event['timestamp'].to_f).year"
            }
        } else {
            ruby{
                code => "event['timestamp'].gsub!(/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/,(@@map[event['path']] || Time.now.year).to_s+' \1')"
            }
        }

You can probably adapt the approach to your situation.

3 Likes

Camden -

Thx for sharing. Looking at this code with my beginner Ruby skills, it looks like you initialize a Map structure, into which you add a Year value which you extract from the Event's timestamp.

The Else clause seems to replace all instances of Jan,Feb, .... Dec with the Year that you've stored in the map previously or current Year.

I understand the concept in general. Some questions on your example:

  1. At what point in the config did you insert this section of the custom code?
  2. What is the scope of this @@map variable? I think this is a class variable... is this what makes it 'persistent' between different events?
  3. Why the choice of a map Enumeration when you're only keeping track of one item?! Just curious

Here is my data and an illustration of what I need to extract and then add to each event:

Just like the illustration shows above, once I encounter the file name that I'm looking for (eg., req_output_ALL_ALL_IC2ECFTD_580.xml) I'd like to preserve that part in some save_variable, and then add it to the end of each of the specific events that will have carefully extracted, while skipping others. In other words, I need to "decorate" certain events with data from the save_variable.

Then, If I encounter another Response File, preserve that one (discard the old one), and then use this new one to decorate the events that follow.

I guess I have no idea if such a feat could be done with "in-line" Ruby code here in this config file, or if it requires a new Filter to be created.

  1. I put it after my grok filter and before my date filter, but it depends on your entire config what the correct place is to put the ruby block.
  2. @@map is indeed a class variable and yes that is what makes it 'persistent' between different events.
  3. I used a map because I keep track of the year for each input file separately. That's why I index the map with the path field.

Another note is that to do what you want to do you need to limit logstash to one worker. Not sure if someone's mentioned that yet.

Ok - here is what I've tried so far:

            # Get the main data from the logs
            grok { 
                match => { 
                    "message" => [ 
                        "(?<tslice>%{DATE_EU} .... %{GREEDYDATA:cftsOutputFilePath}"
                    ]
                } 
                                 
            }
            if "_grokparsefailure" in [tags] {
                    drop { }                 
            } 
            
            # If you found cftsOutputFile, print out that element you found
            if [cftsOutputFilePath] =~ /.+/ {
                ruby{    
                    init => "@respFilename = event['cftsOutputFilePath']"
                    code => "puts @respFilename"
                }
                drop{ }
            }

Attempting to do 'baby steps' by trying to isolate the existence of that one field, and then, attempting to assign it to a simple local variable, and then simply print it out.... child's play, yet it doesn't work for me... :slight_smile:

The runtime error is:

undefined local variable or method `event' for #LogStash::Filters::Ruby:0x6876a023

Init runs before any log lines are parsed, it's just for initializeing state that needs to exist before you start; so there's no event yet. You want that code in the code string.

You might also be able to use: https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html . However, the problem here is that there's a potential for races where the you need to look something up that isn't in Elasticsearch quite yet. If you can do your processing in two phases this should work however. It will of course be slower than the ruby filter since it needs to do IO over the network.

1 Like

I didn't even know that was possible - thx for that link and technique.

Ok - so I am now using the class variable, and it's getting populated with the right values at the right time:

    ruby{
    init => "@@respFilename = ''"
    code => "@@respFilename = event['cftsOutputFilePath'] if ( (@@respFilename.empty?) || (!@@respFilename.eql?(event['cftsOutputFilePath'])) )
         puts @@respFilename"
    }

So now that I have that variable in ruby, how do I add @@respFilename to the Event? This is the last step, I've seen examples but they showed how to add a brand new event, I need to augment an existing event with this data as an additional field. How do I do that?

I've tried the following with no success:

        mutate {
            add_field => { "responseFileName" => "%{@@respFilename}" }  
        }

... this too, was not valid at runtime:

ruby{
    add_field => { "responseFileName" => "%{@@respFilename}" }  
}

Answering my own question here, for the benefit of future readers who may stumble upon this topic:

I realized through trial and error that the event variable that's exposed/available to the ruby filter can be used to retrieve individual pieces from the event. Thus, I was able to retrieve the message section of the event using event['message'] command. It was also a revelation to me that the message is a String variable that can be manipulated, appended to. As a novice Rubyist and not knowing the variable types, it took a while to arrive at this, even though this answer was simple. Retrieve the message variable, test for a condition, and then simply concatenate the message with the additional piece of data in a NVP format, thus decorating the message portion of the event. Here is the piece of code that does the trick for me:

    ruby{
        code => "(event['message']  = event['message'] + ' cftsFileName=' + @@respFilename) if (event['message'].include?'Contract Number' and event['message'].include?'Work Number')"
    }
1 Like

Is there a documentation link that would describe additional actions that you can do on that event?! How to cancel the event, how to check additional attributes, etc?

Is there anybody out there?!

Hi man, thanks for your solution, it is working for me..
now I wanna contribute a bit. I see you are trying to store the class variable in to the logstash stream event.

I make it work like this

{
code => " event['cftsFileName'] = @@respFilename"
}

This way you write the event variable cftsFileName and you can continue working with it in the stream

Thank you, xxnull - that's very useful to know.

1 Like

Hello, I am trying to achieve the same thing you guys are talking about, but when I try to do event['fieldName'] I get the following error:

:message=>"Ruby exception occurred: undefined method `[]'

Did you guys encounter this? It seems like the documentation indicates that I should be able to access the fields like this.

You have to do it like this:

      if ([@metadata][LOCAL_FIELD_4] =~ /.+/) {
            ruby{
                code => "
                         event['type'] = 'costing'
                         event['tslice'] = @@TimeStamp
                         event['senderID'] = @@SenderId

I'm a bit confused. Don't you want to assign the class variable to the event value, not the other way around? And did you change this code now because logstash has updated since you wrote this post?