I think the following behaviour is a bug because it causes some data loss.
Please assume the following log file /tmp/sample_json.log
:
{"someDate":"2016-09-28T01:40:26.760+0000", "someNumberAsString": "1475026826760", "someNumber": 1475026826760, "someString": "foobar", "someString2": "2017 is awesome"}
Now the following filebeat 5.2 configuration:
filebeat.prospectors:
- input_type: log
paths:
- /tmp/sample_json.log
processors:
- decode_json_fields:
fields: ["message"]
target: "foobar"
max_depth: 10
output.logstash:
hosts: ["localhost:5044"]
The this is the decoded output in logstash ruby debug output:
{
"message" => "{\"someDate\":\"2016-09-28T01:40:26.760+0000\", \"someNumberAsString\": \"1475026826760\", \"someNumber\": 1475026826760, \"someString\": \"foobar\", \"someString2\": \"2017 is awesome\"}",
"@version" => "1",
"@timestamp" => "2017-02-03T08:53:57.579Z",
"beat" => {
"name" => "fabien1",
"hostname" => "fabien1",
"version" => "5.2.0"
},
"source" => "/tmp/sample_json.log",
"offset" => 170,
"foobar" => {
"someDate" => 2016,
"someNumber" => 1475026826760,
"someNumberAsString" => 1475026826760,
"someString" => "foobar",
"someString2" => 2017
},
"type" => "log",
"input_type" => "log",
"host" => "fabien1",
"tags" => [
[0] "beats_input_codec_plain_applied"
]
}
The problem here is that we loose the fact that someString2
was not only a number but a full string:
- so if for other records it was "What an awesome year!" then we will have a type conflict: sometimes an int, sometimes a string
- same for "someDate" that becomes
2016
with a lot of data loss
In conclusion, this parsing breaks the schema of the data: only json fields that were numbers should be converted as numbers.
I agree that changing this behaviour might break some compatibility for people assuming to get integers whenever possible, but the default behaviour is unusable in most cases.