Repeat field extraction and aggregation

RitzMak · August 5, 2022, 3:59pm

Hello All,

I have some logs as below and I would like to get a total of all CACHE_TIMING and DATABASE_TIMING fields. I am thinking to create an array of these fields using gsub replacement and kv filter and then add them.
Is there a better way to accomplish this?

RANDOM CACHE_TIMING: 10
SESSION_LOG DATABASE_TIMING: 8
ACRONYM CACHE_TIMING: 6
PARM DATABASE_TIMING: 4
UNIQUE_SESSION_ID DATABASE_TIMING: 3
RANDOM DATABASE_TIMING: 2
RANDOM CACHE_TIMING: 1
RANDOM CACHE_TIMING: 1
COMMIT DATABASE_TIMING: 1
RANDOM DATABASE_TIMING: 1
RANDOM CACHE_TIMING: 0
RANDOM CACHE_TIMING: 0

Badger · August 5, 2022, 4:26pm

Are all those part of one event? If you use

output { stdout { codec => rubydebug } }

then what does a single event look like?

RitzMak · August 5, 2022, 4:46pm

Actual event is too large.. I have extracted other important fields and kept this timing related data into a separate field so I have to work only on a small section.

Badger · August 5, 2022, 5:00pm

OK, so what does that separate field look like in the rubydebug output?

RitzMak · August 5, 2022, 5:21pm

output is like below:

msg_timing" => "\t\tCOMMIT DATABASE_TIMING: 121\n\t\tPARM CACHE_TIMING: 14\n\t\tHOLIDAY CACHE_TIMING: 11\n\t\tDESK_PARM_XP CACHE_TIMING: 10\n\t\tACRONYM CACHE_TIMING: 9\n\t\tPARM DATABASE_TIMING: 8\n\t\tCIRCLE DATABASE_TIMING: 6\n\t\tBRAND_INV_TYPE CACHE_TIMING: 5\n\t\tSERVICE_STATUS DATABASE_TIMING: 4\n\t\tCORRESPONDENT CACHE_TIMING: 4\n\t\tV$MYSTAT DATABASE_TIMING: 3\n\t\tGROUP_PRICE_LEVEL CACHE_TIMING: 3\n\t\tWATCHLIST_OFFERING CACHE_TIMING: 3\n\t\tDBMS_SESSION.UNIQUE_SESSION_ID DATABASE_TIMING: 2\n\t\tPARM CACHE_TIMING: 2\n\t\tRESULT_FILTER CACHE_TIMING: 2\n\t\tBRAND_USER_SUBTYPE CACHE_TIMING: 1\n\t\tTRA CACHE_TIMING: 1\n\t\tSESSION_LOG DATABASE_TIMING: 1\n\t\tPKG_LOG_SESSION.START_LOG DATABASE_TIMING: 1\n\t\tORDER_CONFIG_USER DATABASE_TIMING: 1\n\t\tUSER_GROUP_FOR__USER DATABASE_TIMING: 1\n\t\tUSER_SUBTYPE CACHE_TIMING: 0\n\t\tDESK_PRICING_SOURCE CACHE_TIMING: 0\n\t\tBOND_ISSUE CACHE_TIMING: 0\n\t\tBRAND CACHE_TIMING: 0\n "

Badger · August 5, 2022, 7:20pm

I would do that in a ruby filter

    ruby {
        code => '
            message = event.get("message")
            db = message.scan(/ DATABASE_TIMING: (\d+)/)
            cache = message.scan(/ CACHE_TIMING: (\d+)/)
            # This gets us
            # [["121"], ["8"], ["6"], ["4"], ["3"], ["2"], ["1"], ["1"], ["1"], ["1"]]
            # [["14"], ["11"], ["10"], ["9"], ["5"], ["4"], ["3"], ["3"], ["2"], ["2"], ["1"], ["1"], ["0"], ["0"], ["0"], ["0"]]
            db = db.flatten         # Flatten inner arrays
            db = db.map(&:to_i)     # Convert array entries to integers
            db = db.reduce(0, :+)   # Sum array entries

            cache = cache.flatten
            cache = cache.map(&:to_i)
            cache = cache.reduce(0, :+)

            event.set("totalCacheTiming", cache)
            event.set("totalDatabaseTiming", db)
        '
    }

will get you

  "totalCacheTiming" => 65,
"totalDatabaseTiming" => 148,

RitzMak · August 5, 2022, 7:49pm

That has worked like a charm! Thank you very much Badger.

system · September 2, 2022, 7:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Array: Extract the values and add the values Logstash	1	334	December 25, 2018
How to extract the time elapsed from a value of a field to another value of the same field Logstash	9	528	February 13, 2019
Aggregation filter half work Logstash	4	266	December 30, 2020
Logstash Combine two fields of different documents based on another field (without an ID) Logstash	3	464	May 29, 2020
Aggregate filter Logstash	8	394	August 13, 2021

Repeat field extraction and aggregation

Related topics