They are mapped as keyword, but i would like to have the "kb" (or file size denom.) stripped and to store them as a Numeric so i can query. What would be the best way to do this?
Reviving this topic - i may be misunderstanding how the Bytes processor works. It says:
Converts a human readable byte value (e.g. 1kb) to its value in bytes (e.g. 1024). If the field is an array of strings, all members of the array will be converted.
Supported human readable units are "b", "kb", "mb", "gb", "tb", "pb" case insensitive. An error will occur if the field is not a supported format or resultant value exceeds 2^63.
When running the bytes processor on my field, it seems to have no affect. This is the input on test document:
That makes sense. I do some scripting in the ingest pipeline that makes the resources_used.mem end up that way. That field in the source is "message". I parse through that field in the context using the script below, splitting the string on " " and then "=", appending values to the context (In this case the source would be "resources_used.mem=3528kb").
I presume the formatting error stems from this... but my painless skill is quite weak.:
if (ctx['message'].empty){
String donothing = "";
}else{
String[] messSplit = ctx['message'].splitOnToken(' ');
int i = 0;
for (item in messSplit){
i = i+1;
if (item.contains("Resource_List.select")){
String[] splitItem = /=/.split(item,2);
String label = splitItem[0];
String data = splitItem[1];
ctx[label] = data;
} else {
String[] splitItem = item.splitOnToken("=");
int length = splitItem.length;
if (length <= 1){
continue;
}
String label = splitItem[0];
String data = splitItem[1];
ctx[label] = data;
}
}
}
ctx[label]=data is what outputs "resources_used.mem":"3528kb" to the field of the same name, mapped as text.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.