Background
Given the following input (these are snippets from the input JSON Lines):
"MEMLIMIT Size":0
...
"MEMLIMIT Size":0.8590E+10
Logstash (I'm using 7.9.3) outputs:
"MEMLIMIT Size" => 0
"MEMLIMIT Size" => 8590000000.0
I'm fine with the 0 output value.
However, I'm not fine with the trailing .0 on 8590000000.0, because it causes the following mapping error:
Could not index event ... mapper [MEMLIMIT Size] cannot be changed from type [long] to [float]
I do not want to configure Logstash (or Elasticsearch, for that matter) to perform any special processing on the field "MEMLIMIT Size", because this is just one example of such a field. Other fields might also have integer values represented in scientific notation.
I have some inkling why Logstash might do this, because this particular input JSON Lines, which I helped to design, deliberately specifies a trailing .0 on integer values for fields that might contain a float value, to avoid Elastic attempting to index a float value to a field that has been incorrectly mapped as an integer based on the first indexed value.
However, in this case, the trailing .0 is undesirable.
Questions
-
How do I prevent Logstash from appending that
.0? Especially, when the "10" in "E+10" shifts the decimal point way past the number of digits specified in the original value? My "inkling" aside, it's presumptuous of Logstash to specify that level of precision. -
Alternatively, is there a way to prevent Logstash from expanding the scientific notation? Or would that just move the same problem to Elasticsearch? Would Elasticsearch expand that notation and "append" the
.0, resulting in the same problem? (I haven't tested sending such scientific notation directly to Elasticsearch.)
Possible answer
Perhaps: iterate over all numeric fields; if a field value is greater than, say, 99999999999, convert (mutate?) to an integer (i.e. truncate any decimal fraction). If that sound doable, I'd appreciate help coding an efficient solution (with a Ruby .each? I'm a Ruby newbie.)
Config
Here's my Logstash config (with output set to stdout for testing; normally, it outputs to elastic; yes, I understand that line is the default codec for stdin).
input {
stdin {
codec => line
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:_time}" }
}
date {
match => [ _time, ISO8601 ]
}
json {
source => "message"
remove_field => [ _time, message ]
}
mutate {
lowercase => [ "code" ]
}
}
output {
stdout {
}
}
, but at this point I need to focus on ingesting this data in Elastic.