Convert strings with different data units (MB,GB,TB) to byte

Hello!
i use grok to parse a log file with different data units, the fields are saved as strings, JSON example:

"totalserved": "4.4 MB",
"totalrequested": "4.4 MB",
"cacheserved": "4.4 MB",
"internetserved": "0 GB",
"peersserved": "4.5 GB"

Now i want them to be automatically recognized by elasticsearch as bytes (not MB or GB) and not as a string.
Is this possible?

I don't believe it's doable in elasticsearch today.
Could be a nice mapper to have like the date type mapper we have.

May I suggest that you open a feature request and see what the team answers?

Also you can probably build your own plugin to achieve that.

Better than that with elasticsearch 5.0 you'll be able to write an ingest plugin which can convert something like that in a value in bytes.

Thank you very much for your answer :slight_smile:

I am trying now to parse out the units for every field and then calculate the bytes into a new field. I am excited to see if it works!

ICU has a MeasureFormat http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MeasureFormat.html
with a parse method.

This can work with http://icu-project.org/apiref/icu4j/com/ibm/icu/util/MeasureUnit.html#MEGABYTE and http://icu-project.org/apiref/icu4j/com/ibm/icu/util/MeasureUnit.html#GIGABYTE and much more.

I can add this as an analyzer / token filter to my ICU plugin at https://github.com/jprante/elasticsearch-icu

You could manage it in Logstash. Here's an example from something I did; your approach would have to vary slightly (might actually be easier, since this example is for strings with units appended without any whitespace.

# split the "store" into number, units prefix, and units base
    grok {
        # would like to put store_units_prefix and store_units_base in @metadata, too
        match => { "store" => "^%{BASE10NUM:[@metadata][store_number]:float}(?<store_units_prefix>[kKmMgGtT])(?<store_units_base>[b])$" }
    }
    mutate {
        add_field => {
            "[@metadata][store_units_prefix]" => "%{store_units_prefix}"
            "[@metadata][store_units_base]" => "%{store_units_base}"
        }
        remove_field => [ "store_units_prefix", "store_units_base" ]
    }
    if [@metadata][store_units_prefix] == "k" or [@metadata][store_units_prefix] == "K" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1024 } }
    } else if [@metadata][store_units_prefix] == "m" or [@metadata][store_units_prefix] == "M" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1048576 } }
    } else if [@metadata][store_units_prefix] == "g" or [@metadata][store_units_prefix] == "G" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1073741824 } }
    } else if [@metadata][store_units_prefix] == "t" or [@metadata][store_units_prefix] == "T" {
        mutate { add_field => { "[@metadata][store_multiplier]" => 1099511627776 } }
    }
# I don't know how to specify type in mutate.add_field, so I convert it
    mutate {
        convert => { "[@metadata][store_multiplier]" => "integer" }
    }
# create a new field with the size in bytes
    ruby {
        code => "event['store_size'] = event['@metadata']['store_number']*event['@metadata']['store_multiplier']"
    }

Thank you for sharing your configuration :slight_smile: This is exactly what i am
coding right now - glad to hear that This Works!