Background
Given the following input (these are snippets from the input JSON Lines):
"MEMLIMIT Size":0
...
"MEMLIMIT Size":0.8590E+10
Logstash (I'm using 7.9.3) outputs:
"MEMLIMIT Size" => 0
"MEMLIMIT Size" => 8590000000.0
I'm fine with the 0
output value.
However, I'm not fine with the trailing .0
on 8590000000.0
, because it causes the following mapping error:
Could not index event ... mapper [MEMLIMIT Size] cannot be changed from type [long] to [float]
I do not want to configure Logstash (or Elasticsearch, for that matter) to perform any special processing on the field "MEMLIMIT Size", because this is just one example of such a field. Other fields might also have integer values represented in scientific notation.
I have some inkling why Logstash might do this, because this particular input JSON Lines, which I helped to design, deliberately specifies a trailing .0
on integer values for fields that might contain a float value, to avoid Elastic attempting to index a float value to a field that has been incorrectly mapped as an integer based on the first indexed value.
However, in this case, the trailing .0
is undesirable.
Questions
-
How do I prevent Logstash from appending that
.0
? Especially, when the "10" in "E+10" shifts the decimal point way past the number of digits specified in the original value? My "inkling" aside, it's presumptuous of Logstash to specify that level of precision. -
Alternatively, is there a way to prevent Logstash from expanding the scientific notation? Or would that just move the same problem to Elasticsearch? Would Elasticsearch expand that notation and "append" the
.0
, resulting in the same problem? (I haven't tested sending such scientific notation directly to Elasticsearch.)
Possible answer
Perhaps: iterate over all numeric fields; if a field value is greater than, say, 99999999999, convert (mutate?) to an integer (i.e. truncate any decimal fraction). If that sound doable, I'd appreciate help coding an efficient solution (with a Ruby .each
? I'm a Ruby newbie.)
Config
Here's my Logstash config (with output set to stdout
for testing; normally, it outputs to elastic; yes, I understand that line
is the default codec for stdin
).
input {
stdin {
codec => line
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:_time}" }
}
date {
match => [ _time, ISO8601 ]
}
json {
source => "message"
remove_field => [ _time, message ]
}
mutate {
lowercase => [ "code" ]
}
}
output {
stdout {
}
}