Convert strings with different data units (MB,GB,TB) to byte

Hello!
i use grok to parse a log file with different data units, the fields are saved as strings, JSON example:

"totalserved": "4.4 MB",
"totalrequested": "4.4 MB",
"cacheserved": "4.4 MB",
"internetserved": "0 GB",
"peersserved": "4.5 GB"

Now i want them to be automatically recognized by elasticsearch as bytes (not MB or GB) and not as a string.
Is this possible?

I don't believe it's doable in elasticsearch today.
Could be a nice mapper to have like the date type mapper we have.

May I suggest that you open a feature request and see what the team answers?

Also you can probably build your own plugin to achieve that.

1 Like

Better than that with elasticsearch 5.0 you'll be able to write an ingest plugin which can convert something like that in a value in bytes.

1 Like

Thank you very much for your answer :slight_smile:

I am trying now to parse out the units for every field and then calculate the bytes into a new field. I am excited to see if it works!

ICU has a MeasureFormat http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MeasureFormat.html
with a parse method.

This can work with http://icu-project.org/apiref/icu4j/com/ibm/icu/util/MeasureUnit.html#MEGABYTE and http://icu-project.org/apiref/icu4j/com/ibm/icu/util/MeasureUnit.html#GIGABYTE and much more.

I can add this as an analyzer / token filter to my ICU plugin at https://github.com/jprante/elasticsearch-icu

1 Like

You could manage it in Logstash. Here's an example from something I did; your approach would have to vary slightly (might actually be easier, since this example is for strings with units appended without any whitespace.

# split the "store" into number, units prefix, and units base
    grok {
        # would like to put store_units_prefix and store_units_base in @metadata, too
        match => { "store" => "^%{BASE10NUM:[@metadata][store_number]:float}(?<store_units_prefix>[kKmMgGtT])(?<store_units_base>[b])$" }
    }
    mutate {
        add_field => {
            "[@metadata][store_units_prefix]" => "%{store_units_prefix}"
            "[@metadata][store_units_base]" => "%{store_units_base}"
        }
        remove_field => [ "store_units_prefix", "store_units_base" ]
    }
    if [@metadata][store_units_prefix] == "k" or [@metadata][store_units_prefix] == "K" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1024 } }
    } else if [@metadata][store_units_prefix] == "m" or [@metadata][store_units_prefix] == "M" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1048576 } }
    } else if [@metadata][store_units_prefix] == "g" or [@metadata][store_units_prefix] == "G" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1073741824 } }
    } else if [@metadata][store_units_prefix] == "t" or [@metadata][store_units_prefix] == "T" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1099511627776 } }
    }
# I don't know how to specify type in mutate.add_field, so I convert it
    mutate {
        convert => { "[@metadata][store_multiplier]" => "integer" }
    }
# create a new field with the size in bytes
    ruby {
        code => "event['store_size'] = event['@metadata']['store_number']*event['@metadata']['store_multiplier']"
    }
4 Likes

Thank you for sharing your configuration :slight_smile: This is exactly what i am
coding right now - glad to hear that This Works!