How to map memory size

I have fields that are currently being ingested as

resources_used.vmem: 1028974kb
resources_requested.vmem: 2000000kb

They are mapped as keyword, but i would like to have the "kb" (or file size denom.) stripped and to store them as a Numeric so i can query. What would be the best way to do this?

Use an ingest pipeline with this processor: Bytes processor | Elasticsearch Guide [8.14] | Elastic

Reviving this topic - i may be misunderstanding how the Bytes processor works. It says:

Converts a human readable byte value (e.g. 1kb) to its value in bytes (e.g. 1024). If the field is an array of strings, all members of the array will be converted.
Supported human readable units are "b", "kb", "mb", "gb", "tb", "pb" case insensitive. An error will occur if the field is not a supported format or resultant value exceeds 2^63.

When running the bytes processor on my field, it seems to have no affect. This is the input on test document:

...,
"resources_used.mem": "3528kb",
...

This is the processor:


This is the output

"resources_used.mem": "3528kb",

The input type is a string. I can't find any other info on the processor, what the input type needs to be, or what it outputs as.

For reference this is Elastic 8.16.1

Go to Dev Tools to Debug

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "bytes": {
          "field": "resources_used.mem",
          "ignore_failure": true
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "resources_used.mem": "3528kb"
        
      }
    },
    {
      "_source": {
        "resources_used": {
          "mem": "3528kb"
        }
      }
    }
  ]
}

Note the first fails and the second works... what does the actual source document look like ... Ingest pipeline does not work on dotted fields

That makes sense. I do some scripting in the ingest pipeline that makes the resources_used.mem end up that way. That field in the source is "message". I parse through that field in the context using the script below, splitting the string on " " and then "=", appending values to the context (In this case the source would be "resources_used.mem=3528kb").

I presume the formatting error stems from this... but my painless skill is quite weak.:

if (ctx['message'].empty){
  String donothing = "";
}else{
  String[] messSplit = ctx['message'].splitOnToken(' ');
  int i = 0;
  for (item in messSplit){
    i = i+1;
    if (item.contains("Resource_List.select")){
      String[] splitItem = /=/.split(item,2);
      String label = splitItem[0];
      String data = splitItem[1];
      ctx[label] = data;
    } else {
      String[] splitItem = item.splitOnToken("=");
      int length = splitItem.length;
      if (length <= 1){
        continue;
      }
      String label = splitItem[0];
      String data = splitItem[1];
      ctx[label] = data;
    }
  }
}

ctx[label]=data is what outputs "resources_used.mem":"3528kb" to the field of the same name, mapped as text.

Now works for both... :slight_smile:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "dot_expander": {
          "field": "resources_used.mem"
        }
      },
      {
        "bytes": {
          "field": "resources_used.mem",
          "ignore_failure": true
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "resources_used.mem": "3528kb"
      }
    },
    {
      "_source": {
        "resources_used": {
          "mem": "3528kb"
        }
      }
    }
  ]
}

That worked. Thanks! Nice to have a processor made for this exact purpose :grin:.