How to match 2 key value pairs in unstructured string that looks like column

I have a document with some lines like:

  Example requested: 24:00:00            Example Used: 01:14:11        

What I want is to have is:

{
"example_requested": "24:00:00",
"example_used": 01:14:11
}

What I tried (this is ruby block):

if x =~ /(Example requested)/
        example_used = x.scan(/^(?:.*?: ){2}(.*)$/)[0]
   	    example_used = (example_requested * "").gsub(/\s+/, "")

I get

"example_used": 01:14:11

but when i change {2} to {1} to get example_requested, the only thing I was able to get is:
"example_requested" : "24:00:00ExampleUsed:01:14:11"

What is the best way to achieve this?

You could use a kv filter

    kv {
        field_split_pattern => "\s+"
        value_split_pattern => ":"
        trim_value => " "
    }

to get

 "requested" => "24:00:00",
      "Used" => "01:14:11",

or you could use ruby

    ruby {
        code => '
            message = event.get("message")
            matches = message.scan(/\s*([a-zA-Z ]+): (\d{2}:\d{2}:\d{2})/)
            matches.each { |x|
                event.set(x[0].downcase.gsub(/ /, "_"), x[1])
            }
        '
    }

will get you

"example_requested" => "24:00:00",
     "example_used" => "01:14:11",

Both look fragile to me.

That's true, but I forgot to mention that sometimes lines don't contain just numbers, e.g. one of them looks like


Memory Requested:   3.5TB                 Memory Used: 668.41GB

Then change the regexp from (\d{2}:\d{2}:\d{2})/) to something like ([\d:.BKMGT]).

(\d{2}:\d{2}:\d{2})/) works for case

"example_requested" => "24:00:00",
 "example_used" => "01:14:11",

but ([\d:.BKMGT]) leaves me with some strange numbers, something that was Memory Requested: 4.5TB becomes tb______memory_requested => "6"

([\d:.BKMGT]+) perhaps

Now values are fine, but example_requested is messed up.

======================================================================================
                  Resource Usage on 2021-06-27 00:18:22:         
                                           CPU Time Used: 1282:26:07                                 
   Memory Requested:   4.5TB                 Memory Used: 668.34GB        
   Example requested: 24:00:00               Example Used: 01:14:11        
   FS requested:    400.0GB                  FS used: 8.16MB          
======================================================================================

output is:

          "tb_________________memory_used" : "668.34GB",
          "gb________________fs_used" : "8.16MB",
          "example_requested" : "24",
          "cpu_time_used" : "1282",
          "example_used" : "01"

I suggested ([\d:.BKMGT]+), not ([\d:.BKMGT+])

was wrongly editing your qoute, sorry, still, it doesn't find memory requested and FS requested fields.

Is there any reason why is this unable to get matched with regex, Rubular: (?:.?Memory Requested:\s+)(.*\s\s) and just trim blank spaces with gsub, because I can get memory used, example used and fs used fields with it, but don't understand why exactly can't get the first one (requested) ?

Use

matches = message.scan(/\s*([a-zA-Z ]+):\s+(([\d:.BKMGT]+))/)

If the names of the fields are fixed you can use a bytes filter to convert them to numbers.

Well, now it works even field names are correct.. thank you sir

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.