When a JSON object is logged as a string and subsequently as an object (or visa versa), it causes Elasticsearch to throw a stack trace and not record the element, but there is no recovery form Logstash, which causes the log line to be completely LOST.
Here's an example:
Write this to a log:
{ "foo": "bar" }
then write this to the same log:
{ "foo": {"apple":"one", "banana":"two"}}
Both are valid JSON, but in the first case, "foo" has a string value, but in the second case, "foo" has an object value.
Logstash shows this in the output { stdout{} }:
2017-10-26T21:29:01.218Z i-00a0dc3b8715bdf83 { "foo": "bar" }
2017-10-26T21:29:36.219Z i-00a0dc3b8715bfe83 { "foo": {"apple":"one", "banana":"two"}}
But Elasticsearch throws a stack trace (depends if the first was an object and the next a string, or visa versa):
org.elasticsearch.index.mapper.MapperParsingException: object mapping for [foo] tried to parse field [foo] as object, but found a concrete value
or org.elasticsearch.index.mapper.MapperParsingException: failed to parse [foo]
The net result is that the log line is never recorded in Elasticsearch.
The filter in Logstash is json { source => "message" }
.
skip_on_invalid_json
has no effect (because the JSON is valid). I've tried disabling the Logstash json filter, and instead relying on Filebeat, with this being the relevant config:
json.message_key: message
json.keys_under_root: true
json.add_error_key: true
But the same issue occurs in Elasticsearch.
Obviously, the best solution is to have consistent + clean logs, which we're working towards - but in the iterim is there a way I can prevent this from happening? I'm posting this here in Logstash instead of Elasticsearch because 1) Logstash is supposed to have means to ensure logs aren't lost and 2) I'm hoping there is a way of filtering it out and recording it as as string (if nothing else), but still keep JSON filtering enabled.
Thanks for any suggestions or pointers.