Repeat field extraction and aggregation

Hello All,

I have some logs as below and I would like to get a total of all CACHE_TIMING and DATABASE_TIMING fields. I am thinking to create an array of these fields using gsub replacement and kv filter and then add them.
Is there a better way to accomplish this?

RANDOM CACHE_TIMING: 10
SESSION_LOG DATABASE_TIMING: 8
ACRONYM CACHE_TIMING: 6
PARM DATABASE_TIMING: 4
UNIQUE_SESSION_ID DATABASE_TIMING: 3
RANDOM DATABASE_TIMING: 2
RANDOM CACHE_TIMING: 1
RANDOM CACHE_TIMING: 1
COMMIT DATABASE_TIMING: 1
RANDOM DATABASE_TIMING: 1
RANDOM CACHE_TIMING: 0
RANDOM CACHE_TIMING: 0

Are all those part of one event? If you use

output { stdout { codec => rubydebug } }

then what does a single event look like?

Actual event is too large.. I have extracted other important fields and kept this timing related data into a separate field so I have to work only on a small section.

OK, so what does that separate field look like in the rubydebug output?

output is like below:

msg_timing" => "\t\tCOMMIT DATABASE_TIMING: 121\n\t\tPARM CACHE_TIMING: 14\n\t\tHOLIDAY CACHE_TIMING: 11\n\t\tDESK_PARM_XP CACHE_TIMING: 10\n\t\tACRONYM CACHE_TIMING: 9\n\t\tPARM DATABASE_TIMING: 8\n\t\tCIRCLE DATABASE_TIMING: 6\n\t\tBRAND_INV_TYPE CACHE_TIMING: 5\n\t\tSERVICE_STATUS DATABASE_TIMING: 4\n\t\tCORRESPONDENT CACHE_TIMING: 4\n\t\tV$MYSTAT DATABASE_TIMING: 3\n\t\tGROUP_PRICE_LEVEL CACHE_TIMING: 3\n\t\tWATCHLIST_OFFERING CACHE_TIMING: 3\n\t\tDBMS_SESSION.UNIQUE_SESSION_ID DATABASE_TIMING: 2\n\t\tPARM CACHE_TIMING: 2\n\t\tRESULT_FILTER CACHE_TIMING: 2\n\t\tBRAND_USER_SUBTYPE CACHE_TIMING: 1\n\t\tTRA CACHE_TIMING: 1\n\t\tSESSION_LOG DATABASE_TIMING: 1\n\t\tPKG_LOG_SESSION.START_LOG DATABASE_TIMING: 1\n\t\tORDER_CONFIG_USER DATABASE_TIMING: 1\n\t\tUSER_GROUP_FOR__USER DATABASE_TIMING: 1\n\t\tUSER_SUBTYPE CACHE_TIMING: 0\n\t\tDESK_PRICING_SOURCE CACHE_TIMING: 0\n\t\tBOND_ISSUE CACHE_TIMING: 0\n\t\tBRAND CACHE_TIMING: 0\n "

I would do that in a ruby filter

    ruby {
        code => '
            message = event.get("message")
            db = message.scan(/ DATABASE_TIMING: (\d+)/)
            cache = message.scan(/ CACHE_TIMING: (\d+)/)
            # This gets us
            # [["121"], ["8"], ["6"], ["4"], ["3"], ["2"], ["1"], ["1"], ["1"], ["1"]]
            # [["14"], ["11"], ["10"], ["9"], ["5"], ["4"], ["3"], ["3"], ["2"], ["2"], ["1"], ["1"], ["0"], ["0"], ["0"], ["0"]]
            db = db.flatten         # Flatten inner arrays
            db = db.map(&:to_i)     # Convert array entries to integers
            db = db.reduce(0, :+)   # Sum array entries

            cache = cache.flatten
            cache = cache.map(&:to_i)
            cache = cache.reduce(0, :+)

            event.set("totalCacheTiming", cache)
            event.set("totalDatabaseTiming", db)
        '
    }

will get you

  "totalCacheTiming" => 65,
"totalDatabaseTiming" => 148,

That has worked like a charm! Thank you very much Badger.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.