Kindly suggest a filter strategy for "csv in csv" data

Dear all,

I believe I have read all current filter docs, but still I'd like to get input from more knowledgeable folks: I want to parse with logstash curency counter information, structured like:

The input is a text file, shipped in via filebeat:
2016-12-22 18:59|wallet|USD:100:8,USD:200:4,USD:500:0,USD:1000:1,USD:2000:6,USD:5000:1,USD:10000:3
2016-12-22 18:59|piggybank|USD:100:47,USD:200:1,USD:500:0,USD:1000:8,USD:2000:15,USD:5000:0,USD:10000:1

My loglines may contain all ISO currency codes.
Depending on the currency there may be varying repetitions.

The output should enable me to generate stacked bar charts over time, either in Kibana or maybe even with Jupyter Notebooks utilizing matplotlib.

I will import my own data and maybe my wive's -- but I will have an identifier field to distinguish.

After reading the docs I am tempted to use a ruby filter.
Is that the best fitting strategy?

(I will be perfectly able to dabble with the code myself, but rather want an educated hint if that is the right direction.)

I am not familiar with ruby, but as my data looks quite recursive I found sample code for recursive "split" at http://billpatrianakos.me/blog/2015/05/31/turn-a-string-into-a-hash-with-string-dot-to-hash-in-ruby/

line1 = 'EUR:500:1,EUR:1000:2,EUR:2000:3,EUR:5000:4,EUR:10000:5,EUR:20000:6,EUR:50000:7'
class String
  def to_hash(arr_sep=',', key_sep=':')
    array = self.split(arr_sep)
    currency = {}
    denomination = {}

    array.each do |e|
      key_value = e.split(key_sep)
      denomination[key_value[1]] = key_value[2]
      currency[key_value[0]] = denomination
    end

    return currency
  end
end
line1.to_hash

produces
=> {"EUR"=>{"500"=>"1", "1000"=>"2", "2000"=>"3", "5000"=>"4", "10000"=>"5", "20000"=>"6", "50000"=>"7"}}

Obviously that is a bit heavy to put it into each filters per message, and the 'denomination' array should maybe just become a tag or field.

But before I am tempted to write a filter plugin I'd still like to hear other opinions.

It seems as if this codes does give me "grapheable" data:

if [loganstate] == "counters_total" {
    ruby{
        init => "
        "
        code => '
            array = event.get("counters_total").split(",")
            array.each do |e|
                key_value = e.split(":")
                event.set("currency", key_value[0])
                event.set(key_value[0] + "_" + key_value[1], key_value[2].to_i)
            end
        '
    }
}

At least I could then select ~7 fields for a stacked bar graph. hooray

@cprior you approach is sane, I only suggest a slight tweak to the ruby code:

currency, value, quantity = e.split(":")
event.set("currency", currency)
event.set("#{currency}_#{value}", quantity.to_i)
1 Like

Ah, that is a much cleaner syntax indeed!

These # are http://ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/syntax.html#string I just learned -- good to know, concatting with + never feels right.

Many thanks for your opinion!

By the way, when I figured out to use .to_i I was mightily impressed that it ended up in "visualization" as a selectable filed in the drop down list because it turned into a number in elasticsearch.
(I deleted the index after each test, so I also got a fresh "manage index" on every test run.)

Another gotcha was the class variables with @@, explained in http://www.railstips.org/blog/archives/2006/11/18/class-and-instance-variables-in-ruby/
I use that in the init part to keep track if I am in a multi-currency logfile or not.

Why not just do two csv filters, then a kv?

I skipped that filter because the smallest units are not pairs but a triple. https://github.com/logstash-plugins/logstash-filter-kv/blob/master/lib/logstash/filters/kv.rb#L350 returns that single value, it seemed to me, although a bit (for me) funky ruby goes on with this key, v1, v2, ... v6 above.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.