Persistence Issue with Logstash

I'm trying to create a script to run through my previous day's logs, and count occurrences of a particular field in the hash. There are hundreds of possible values that could be assigned to this particular key, including 'nil'. I've tried a couple of different ruby plugin scripts I created, however I keep losing data.

Currently, I extract event['store_location'], then open a csv file and verify the key exists, and then increment the value of that key by one, and re-save it. Obviously this is an expensive operation considering the number of logs that need to be parsed for each day, and having to open/re-save the file for each log . Clearly having a way to persist the csv file would be best, at least in my mind, but I can't figure out the best way to do this.

Any advice/assistance would be greatly appreciated.

Here's my current code:

require 'csv'

currentStore = event["store"]

if currentStore.nil? 
    currentStore = "nil"

store_data = {}

File.open('store_count.csv').each_line {|line|
    line_data = line.split(",")
    if !line_data[1].nil? && !line_data[1].empty?
        store[line_data[0]] = line_data[1].strip.to_i
    else
        next
    end
}

if store_data.key?(currentStore)
    store_data[currentStore] += 1

    CSV.open("store_count.csv", "wb") {
        |csv| store_data.to_a.each {
            |elem| csv << elem
        }
    }
end

How about dumping all the data into Elasticsearch and letting it do the per-day aggregation for you? Failing that, perhaps Redis would be a better persistence option. It has a hash type that allows atomic increments of its integer contents.

Hi Magnus...I don't think the Elasticsearch option will work, since I need to have a list of all of the zero values as well. Could I however try that, and then extract all counts greater than zero, and change them in the csv file that contains all of the fields? If so, how would I create the aggregation that will give the 'store_count'? I'm a little unclear about this part. Basically, I want to have a count for store1, store2, store3, etc., and each time a store appears, its value needs to be incremented by 1. At the end of each day I need to produce a comprehensive list in the form of a .csv file with the number of times each store appeared in the logs. Currently the data is in Elasticsearch, however I'm pulling the previous day's index, and tallying the occurrences for each store.