Parse numbers from Storage units in logs

Extract the NUMBER from logs with storage sizes.

Sample Logs -

physical.memory.total=31.4G, physical.memory.free=18.8G, swap.space.total=7.8G, swap.space.free=7.8G, heap.memory.used=4.5G, heap.memory.free=536.5M, heap.memory.total=5.0G, heap.memory.max=5.0G, native.memory.used=8.2M, native.memory.free=3.5G, native.memory.total=64.0M, native.memory.max=3.5G, native.meta.memory.used=80.0M, native.meta.memory.free=432.0M

Trying to extract the NUMBER for each of these fields. The challenge is it can either be G/M units .
What would be the best way to achieve this ? appreciate any help or guidance .

Thank you

I have done something similar in the past using mutate to replace K with 000, M with 000000 etc, but it would be more accurate to do

    kv { field_split => "," value_split => "=" trim_key => " " target => "[@metadata][metrics]" }
    ruby {
        code => '
            metrics = event.get("[@metadata][metrics]")
            if metrics.is_a? Hash
                metrics.each { |k, v|
                    v.scan(/(.*)([KMG])$/) { |m|
                        case m[1]
                        when "K"
                            multiplyBy = 1024
                        when "M"
                            multiplyBy = 1024 * 1024
                        when "G"
                            multiplyBy = 1024 * 1024 * 1024
                        end
                        event.set(k, m[0].to_f * multiplyBy)
                    }
                }
            end
        '
    }

Thanks Badger.

I gave it a try , worked perfectly; but it dropped many other fields like "load.process=0.00%, load.system=0.01%, load.systemAverage=1.00%". I see some errors in the logs, probably because of other fields extracted using the KV Filter;

Here are the sample logs -

2020-02-04 16:18:56,783 INFO Log4jFactory$Log4jLogger [10.190.100.222]:5701 [Drysdale-PROD-SC9] [3.7.6] Received auth from Connection[id=26876, /10.190.100.222:5701->/10.190.110.202:56584, endpoint=null, alive=true, type=CSHARP_CLIENT], successfully authenticated, principal : ClientPrincipal{uuid='d7d8b718-ed75-4cc3-b51a-c620bb082255', ownerUuid='058720ad-7b35-40f6-8978-bd9cf7e286ec'}, owner connection : true, client version : null

2020-02-04 16:15:27,519 INFO Log4jFactory$Log4jLogger [10.190.100.222]:5701 [Drysdale-PROD-SC9] [3.7.6] processors=8, physical.memory.total=31.4G, physical.memory.free=18.8G, swap.space.total=7.8G, swap.space.free=7.8G, heap.memory.used=4.5G, heap.memory.free=536.5M, heap.memory.total=5.0G, heap.memory.max=5.0G, heap.memory.used/total=89.51%, heap.memory.used/max=89.51%, native.memory.used=8.2M, native.memory.free=3.5G, native.memory.total=64.0M, native.memory.max=3.5G, native.meta.memory.used=80.0M, native.meta.memory.free=432.0M, native.meta.memory.percentage=90.75%, minor.gc.count=18605, minor.gc.time=121136ms, major.gc.count=0, major.gc.time=0ms, load.process=0.00%, load.system=0.01%, load.systemAverage=1.00%, thread.count=69, thread.peakCount=229, cluster.timeDiff=4083, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=351751533, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=1, proxy.count=0, clientEndpoint.count=231, connection.active.count=232, client.connection.count=231, connection.count=1

Is it possible that we only affect memory related fields that contain?

my current Logstash conf looks like this -

  if [name] == "hazel_logs"
    {
    grok {
        match => {"message" => ["(?m)%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:log.level}%{SPACE}\[%{HOSTNAME:hazel.server}\]\:%{NUMBER:hazelcast.port}%{SPACE}\[%{WORD:hazelcast.env}\]%{SPACE}\[(?<hazelcast.version>[\d+.]+)\]+%{SPACE}%{GREEDYDATA:msgbody}","(?m)%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:log.level}%{SPACE}%{DATA:logger}%{SPACE}\[%{IPORHOST:hazel.server}\]\:%{NUMBER:hazelcast.port}%{SPACE}\[%{DATA:hazelcast.env}\]%{SPACE}\[(?<hazelcast.version>[\d+.]+)\]+%{SPACE}%{GREEDYDATA:msgbody}","(?m)%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:log.level}%{SPACE}%{GREEDYDATA:msgbody}"] }  }


            kv {
                    source => "msgbody"
                    value_split => "="
                    allow_duplicate_values => false
                    remove_char_key => "\s\n{\[\]"
                    remove_char_value => "\s\n{},'\[\]%"
                    transform_key => "lowercase"
                    recursive => "true"
		target => "[@metadata][metrics]"
                    #trim_value => ",'%"
                    #whitespace => "strict"
                    }
             grok {
                    patterns_dir => "/etc/logstash/patterns"
                    break_on_match => "false"
                    match => {
                    "msgbody" => [ "Reason:(?<error.details>.[^\[\,\n?]+)\s?", "type=%{DATA:hazelcast.client.type}\]", "minor.gc.time=%{NUMBER:minor.gc.time.ms:int}", "major.gc.time=%{NUMBER:major.gc.time.ms:int}" ]
                            }
                    }
	date 	{
    		locale => "en"
		match => ["logtime", "YYYY-MM-dd HH:mm:ss,SSS"]
		timezone => "America/Los_Angeles"
        		target => "@timestamp"
    		}

            ruby {
                    code => "

                    event.to_hash.keys.each { |k,v|
                    if ( k.end_with?('size') || k.end_with?('count'))
                            event.set(k, event.get(k).to_i)
                    elsif ( k.start_with?('load') || k.start_with?('processors'))
                            event.set(k, event.get(k).to_f)
		else
			event.set(k, event.get(k))
                    end
                    }
                    "
            }

############ Here ############

ruby {
    code => '
        metrics = event.get("[@metadata][metrics]")
        if metrics.is_a? Hash
            metrics.each { |k, v|
                v.scan(/(.*)([KMG])$/) { |m|
                    case m[1]
                    when "K"
                        multiplyBy = 1024
                    when "M"
                        multiplyBy = 1024 * 1024
                    when "G"
                        multiplyBy = 1024 * 1024 * 1024
                    end
                    event.set(k, m[0].to_f * multiplyBy)
                }
            }
        end
    '
}

########### Here #############

        mutate  {
    	        replace => { "type" => "hazelcast" }
    	        remove_field => [ "msgbody", "logtime", "logger", "thread" ]
            	}
	prune
		{
		blacklist_names => [ "serversocketaddr*", "[0-9]+", "gc.time$" ]
		}
}

}

OK, so instead of using

v.scan(/(.*)([KMG])$/) { |m|

use

matches = v.scan(/(.*)([KMG])$/)

then if matches is nil do a straight event.set(k, v), otherwise execute the case statement and the event.set I gave.

hi @Badger,

So, I tried the suggested method and now it is dropping all the fields completely discovered from the KV filter.

Exceptions in Logstash logs -

[2020-02-09T16:17:21,451][ERROR][logstash.filters.ruby ][main] Ruby exception occurred: undefined method match' for ["CSHARP_CLIENT", "CSHARP_CLIENT."]:Array [2020-02-09T16:17:21,451][ERROR][logstash.filters.ruby ][main] Ruby exception occurred: undefined method match' for ["CSHARP_CLIENT", "CSHARP_CLIENT."]:Array
[2020-02-09T16:17:23,973][ERROR][logstash.filters.ruby ][main] Ruby exception occurred: undefined method []' for nil:NilClass [2020-02-09T16:17:26,152][ERROR][logstash.filters.ruby ][main] Ruby exception occurred: undefined method ' for nil:NilClass

Ruby Code -

ruby {
code => "
metrics = event.get('[@metadata][metrics]')
if metrics.is_a? Hash
metrics.each { |k, v|
matches = v.match(/(.*)([KMG])$/)
case
when nil
event.set(k,v)
when 'K'
multiplyBy = 1024
when 'M'
multiplyBy = 1024 * 1024
when 'G'
multiplyBy = 1024 * 1024 * 1024
end
event.set(k, matches[0].to_f * multiplyBy)
}
end
"
}

Did you read the post in which I responded to you?

do you mean this ?

use

matches = v.scan(/(.*)([KMG])$/)

"then if matches is nil do a straight event.set(k, v), otherwise execute the case statement and the event.set I gave."

I did change the code to -

ruby {
code => "
metrics = event.get('[@metadata][metrics]')
if metrics.is_a? Hash
metrics.each { |k, v|
matches = v.match(/(.*)([KMG])$/)
case
when nil
event.set(k,v)
when 'K'
multiplyBy = 1024
when 'M'
multiplyBy = 1024 * 1024
when 'G'
multiplyBy = 1024 * 1024 * 1024
end
event.set(k, matches[0].to_f * multiplyBy)
}
end
"
}

Doing something wrong here ?

Tried changing from using case to regular if conditions. Still see errors and no results.
Appreciate any extra directions.

Thanks

Ended up using the logstash bytes filter plugin, was little more work, but worked as expected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.