How to match 2 key value pairs in unstructured string that looks like column

I have a document with some lines like:

  Example requested: 24:00:00            Example Used: 01:14:11        

What I want is to have is:

{
"example_requested": "24:00:00",
"example_used": 01:14:11
}

What I tried (this is ruby block):

if x =~ /(Example requested)/
        example_used = x.scan(/^(?:.*?: ){2}(.*)$/)[0]
   	    example_used = (example_requested * "").gsub(/\s+/, "")

I get

"example_used": 01:14:11

but when i change {2} to {1} to get example_requested, the only thing I was able to get is:
"example_requested" : "24:00:00ExampleUsed:01:14:11"

What is the best way to achieve this?

You could use a kv filter

    kv {
        field_split_pattern => "\s+"
        value_split_pattern => ":"
        trim_value => " "
    }

to get

 "requested" => "24:00:00",
      "Used" => "01:14:11",

or you could use ruby

    ruby {
        code => '
            message = event.get("message")
            matches = message.scan(/\s*([a-zA-Z ]+): (\d{2}:\d{2}:\d{2})/)
            matches.each { |x|
                event.set(x[0].downcase.gsub(/ /, "_"), x[1])
            }
        '
    }

will get you

"example_requested" => "24:00:00",
     "example_used" => "01:14:11",

Both look fragile to me.

That's true, but I forgot to mention that sometimes lines don't contain just numbers, e.g. one of them looks like


Memory Requested:   3.5TB                 Memory Used: 668.41GB

Then change the regexp from (\d{2}:\d{2}:\d{2})/) to something like ([\d:.BKMGT]).

(\d{2}:\d{2}:\d{2})/) works for case

"example_requested" => "24:00:00",
 "example_used" => "01:14:11",

but ([\d:.BKMGT]) leaves me with some strange numbers, something that was Memory Requested: 4.5TB becomes tb______memory_requested => "6"

([\d:.BKMGT]+) perhaps

Now values are fine, but example_requested is messed up.

======================================================================================
                  Resource Usage on 2021-06-27 00:18:22:         
                                           CPU Time Used: 1282:26:07                                 
   Memory Requested:   4.5TB                 Memory Used: 668.34GB        
   Example requested: 24:00:00               Example Used: 01:14:11        
   FS requested:    400.0GB                  FS used: 8.16MB          
======================================================================================

output is:

          "tb_________________memory_used" : "668.34GB",
          "gb________________fs_used" : "8.16MB",
          "example_requested" : "24",
          "cpu_time_used" : "1282",
          "example_used" : "01"

I suggested ([\d:.BKMGT]+), not ([\d:.BKMGT+])

was wrongly editing your qoute, sorry, still, it doesn't find memory requested and FS requested fields.

Is there any reason why is this unable to get matched with regex, Rubular: (?:.?Memory Requested:\s+)(.*\s\s) and just trim blank spaces with gsub, because I can get memory used, example used and fs used fields with it, but don't understand why exactly can't get the first one (requested) ?

Use

matches = message.scan(/\s*([a-zA-Z ]+):\s+(([\d:.BKMGT]+))/)

If the names of the fields are fixed you can use a bytes filter to convert them to numbers.

Well, now it works even field names are correct.. thank you sir