Kindly suggest a filter strategy for "csv in csv" data

cprior · December 22, 2016, 6:04pm

Dear all,

I believe I have read all current filter docs, but still I'd like to get input from more knowledgeable folks: I want to parse with logstash curency counter information, structured like:

The input is a text file, shipped in via filebeat:
2016-12-22 18:59|wallet|USD:100:8,USD:200:4,USD:500:0,USD:1000:1,USD:2000:6,USD:5000:1,USD:10000:3
2016-12-22 18:59|piggybank|USD:100:47,USD:200:1,USD:500:0,USD:1000:8,USD:2000:15,USD:5000:0,USD:10000:1

My loglines may contain all ISO currency codes.
Depending on the currency there may be varying repetitions.

The output should enable me to generate stacked bar charts over time, either in Kibana or maybe even with Jupyter Notebooks utilizing matplotlib.

I will import my own data and maybe my wive's -- but I will have an identifier field to distinguish.

After reading the docs I am tempted to use a ruby filter.
Is that the best fitting strategy?

(I will be perfectly able to dabble with the code myself, but rather want an educated hint if that is the right direction.)

cprior · December 22, 2016, 8:44pm

I am not familiar with ruby, but as my data looks quite recursive I found sample code for recursive "split" at http://billpatrianakos.me/blog/2015/05/31/turn-a-string-into-a-hash-with-string-dot-to-hash-in-ruby/

line1 = 'EUR:500:1,EUR:1000:2,EUR:2000:3,EUR:5000:4,EUR:10000:5,EUR:20000:6,EUR:50000:7'
class String
  def to_hash(arr_sep=',', key_sep=':')
    array = self.split(arr_sep)
    currency = {}
    denomination = {}

    array.each do |e|
      key_value = e.split(key_sep)
      denomination[key_value[1]] = key_value[2]
      currency[key_value[0]] = denomination
    end

    return currency
  end
end
line1.to_hash

produces
=> {"EUR"=>{"500"=>"1", "1000"=>"2", "2000"=>"3", "5000"=>"4", "10000"=>"5", "20000"=>"6", "50000"=>"7"}}

Obviously that is a bit heavy to put it into each filters per message, and the 'denomination' array should maybe just become a tag or field.

But before I am tempted to write a filter plugin I'd still like to hear other opinions.

cprior · December 23, 2016, 1:07pm

It seems as if this codes does give me "grapheable" data:

if [loganstate] == "counters_total" {
    ruby{
        init => "
        "
        code => '
            array = event.get("counters_total").split(",")
            array.each do |e|
                key_value = e.split(":")
                event.set("currency", key_value[0])
                event.set(key_value[0] + "_" + key_value[1], key_value[2].to_i)
            end
        '
    }
}

At least I could then select ~7 fields for a stacked bar graph. hooray

jsvd · December 23, 2016, 5:14pm

@cprior you approach is sane, I only suggest a slight tweak to the ruby code:

currency, value, quantity = e.split(":")
event.set("currency", currency)
event.set("#{currency}_#{value}", quantity.to_i)

cprior · December 23, 2016, 7:39pm

Ah, that is a much cleaner syntax indeed!

These # are http://ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/syntax.html#string I just learned -- good to know, concatting with + never feels right.

Many thanks for your opinion!

By the way, when I figured out to use .to_i I was mightily impressed that it ended up in "visualization" as a selectable filed in the drop down list because it turned into a number in elasticsearch.
(I deleted the index after each test, so I also got a fresh "manage index" on every test run.)

Another gotcha was the class variables with @@, explained in http://www.railstips.org/blog/archives/2006/11/18/class-and-instance-variables-in-ruby/
I use that in the init part to keep track if I am in a multi-currency logfile or not.

warkolm · December 25, 2016, 3:24am

Why not just do two csv filters, then a kv?

cprior · December 25, 2016, 6:10am

I skipped that filter because the smallest units are not pairs but a triple. https://github.com/logstash-plugins/logstash-filter-kv/blob/master/lib/logstash/filters/kv.rb#L350 returns that single value, it seemed to me, although a bit (for me) funky ruby goes on with this key, v1, v2, ... v6 above.

system · January 22, 2017, 6:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CSV Filter Logstash	1	502	December 8, 2017
Grok csv filter Logstash	14	7109	December 22, 2017
Parsing csv file through Logstash Logstash	18	2290	July 9, 2021
Logstasg KV Logstash	7	595	February 13, 2018
How to write values from dynamic kv filter into csv file Logstash	1	1060	June 25, 2017

Kindly suggest a filter strategy for "csv in csv" data

Related topics