I'd like to share this behaviour to find out whether or not is expected or desired.
Problem:
- A field exists in memory (i.e.
log.syslog.hostname:ZEUS1
) - A JSON message arrives that includes a sub-field of the existing object (i.e.
log.level:INFO
) - The previously existing field disappears and it is not present at the output
Expected behaviour:
Both fields (the existing and the new one) are present at the output.
To reproduce:
Logstash 8.3.3
input {
generator {
message => '{ "log": { "level": "INFO" } }'
count => 1
}
}
filter {
mutate { add_field => { "[log][syslog][hostname]" => "ZEUS1" } }
json {
source => "message"
remove_field => [ "message" ]
}
}
output {
stdout {}
}
Output:
{
"event" => {
"original" => "{ \"log\": { \"level\": \"INFO\" } }",
"sequence" => 0
},
"@timestamp" => 2022-08-13T06:54:52.835787Z,
"log" => {
"level" => "INFO"
},
"host" => {
"name" => "local"
},
"@version" => "1"
}
This is specially impactful when the source data is not under one's control and can change overtime.
A way to mitigate is to unpack the JSON message under target
and rename trusted fields back to the root. But this unsustainable when dealing with big and changing schemas.
Shall this behaviour change?
Thanks