I'm using below Ruby filter to find the elapsed time between a start and end event for a given user. The code works fine with the events coming in sequence. But, I have event logs coming from multiple users. I have used the file_ID condition to avoid mismatch between users. But, I'm missing out on few events as the variable gets updated if multiple start events come together before the corresponding end events.
Sure, you can maintain a hash of hashes, where the outer hash is keyed by the ID, and the inner hash contains variables. That's what the map is in an aggregate filter. Just make sure you address the synchronization and threading issues that an aggregate filter addresses.
Apologies for the delay, I have a quick followup question on the above solution. I tried the below ruby filter for creating a hash keyed to an File_ID field.
Ruby Filter
ruby {
init => "@save_the_timestamp = Hash.new"
code => "
if event.get('Event') == 'Start'
save_the_fileID = event.get('File_ID')
@save_the_timestamp[save_the_fileID] = event.get('@timestamp')
elsif event.get('Event') == 'End'
if @save_the_timestamp[event.get('File_ID')]
event.set('reqtime', event.get('@timestamp') - @save_the_timestamp[save_the_fileID])
@save_the_timestamp.delete(event.get('File_ID'))
end
end
"
}
But, I am receiving the below error
[2020-06-04T23:28:46,051][ERROR][logstash.filters.ruby ][main] Ruby exception occurred: can't convert nil into an exact number
Could you please have a look and let me know whats going wrong here?
No, for a given File_ID, End always appears after the corresponding Start event. Also, in the End condition I am anyways checking for the existence of the hash before performing any action.
And Yes, I have done both. Disabled Java_execution & have set pipeline.workers to 1
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.