Json mapping causes logs to be lost

When a JSON object is logged as a string and subsequently as an object (or visa versa), it causes Elasticsearch to throw a stack trace and not record the element, but there is no recovery form Logstash, which causes the log line to be completely LOST.

Here's an example:
Write this to a log:
{ "foo": "bar" }
then write this to the same log:
{ "foo": {"apple":"one", "banana":"two"}}

Both are valid JSON, but in the first case, "foo" has a string value, but in the second case, "foo" has an object value.

Logstash shows this in the output { stdout{} }:

2017-10-26T21:29:01.218Z i-00a0dc3b8715bdf83 { "foo": "bar" }
2017-10-26T21:29:36.219Z i-00a0dc3b8715bfe83 { "foo": {"apple":"one", "banana":"two"}}

But Elasticsearch throws a stack trace (depends if the first was an object and the next a string, or visa versa):
org.elasticsearch.index.mapper.MapperParsingException: object mapping for [foo] tried to parse field [foo] as object, but found a concrete value or org.elasticsearch.index.mapper.MapperParsingException: failed to parse [foo]

The net result is that the log line is never recorded in Elasticsearch.

The filter in Logstash is json { source => "message" }.

skip_on_invalid_json has no effect (because the JSON is valid). I've tried disabling the Logstash json filter, and instead relying on Filebeat, with this being the relevant config:

json.message_key: message
json.keys_under_root: true
json.add_error_key: true

But the same issue occurs in Elasticsearch.

Obviously, the best solution is to have consistent + clean logs, which we're working towards - but in the iterim is there a way I can prevent this from happening? I'm posting this here in Logstash instead of Elasticsearch because 1) Logstash is supposed to have means to ensure logs aren't lost and 2) I'm hoping there is a way of filtering it out and recording it as as string (if nothing else), but still keep JSON filtering enabled.

Thanks for any suggestions or pointers.

How about writing a Logstash filter to rename the field if it's a string so that the type of each field is consistent?

Great suggestion Magnus, but now I'm running into another (similar) problem. Let me add more details and expand on the case above:

String case log:
{"type":"app","level":"INFO","thread":"main","message":"foobar"}
Object case log:
{"type":"app","level":"INFO","thread":"main","message":{"apple":"one", "banana":"two"} }

In reality the entire log message is JSON, but contains another message field which may or may not contain a Json object.

I applied your suggestion as follows to my logstash config:

filter {
	json {
		source => "message" #need to extract from the actual message sent by filebeat
		skip_on_invalid_json => false
	}
	#now the message filed should just contain the text or object
	if [message] =~ /.*/ {
		mutate {
			rename => { "message" => "msg" }
		}
	}
}

In principle if [message] is just text, it would match and be renamed, if its a json object it doesn't match the =~ /.*/ reg-ex. This is confirmed by looking at the rubydebug stdout - when its just "foobar", it gets renamed to a field called "msg" (and the "message" field appears gone). When its the apple/banana object, we get:

     "message" => {
    "banana" => "two",
     "apple" => "one"
}

So everything should be good. But unfortunately its not. If I start with a clean Elasticsearch instance and the first message happens to be the string case above (which I verify at the console creates a msg field and doesn't pass a "message" back to ES), followed by an object case, ES throws this error:
java.lang.IllegalArgumentException: [message] is defined as an object in mapping [app] but this name is already used for a field in other types

Not sure where to go from here since Logstash doesn't even seem to pass or create a "message' element (creating msg instead), but it is clearly still passing something back to ES which causes the initial mapping to still be created as a string. :frowning:

Ok, with a bit of egg on my face after I posted that, it occurred to me that syslog is also getting dumped into the messages field in Elasticsearch, triggering the mapping to be set to a string. I added this to filter all my non-app messages:

if [type] == "system" {
      mutate {
         gsub => ["source", "/opt/filebeat/logs/os", "/var/log"]
         rename => { "message" => "msg" }
      }
}

And now Elasticsearch seems happy to ingest anything. I'm not sure why my initial test case worked if I sent an object as my initial app message, but this seems to handle all cases that I've tested in the past few mins.

... and then finally, to solve this problem for anyone looking for the simplest solution: rename a field that may contain a JSON object or text to prevent a mapping collision:

filter {
   json {
      source => "message"
      tag_on_failure => ["error_json_parse"]
      skip_on_invalid_json => false
   }
   if [message] !~ /.*/ {
      mutate {
         rename => { "message" => "json" }
      }
   }
}

You'll need to do this for any json fields that may periodically contain text, which means there still is the potential for Elasticsearch to fail to index a message and Logstash thinking its safe. So a followup question: is there a way to have Logstash write output to somewhere else for anything that fails to get indexed successfully in Elasticsearch?

is there a way to have Logstash write output to somewhere else for anything that fails to get indexed successfully in Elasticsearch?

I believe the newly introduced dead letter queue feature does just this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.