MongoDB (5+, json based) : some objects have variable types

LaBonave · August 19, 2022, 8:26am

Hi
(new here)
I've successfully managed to send all of our mongodb 5 instances (JSON-based) logs, via filebeat, on a Logstash instance.
MongoDB 5+ are JSON-based logs (https://www.mongodb.com/docs/manual/reference/log-messages/). The "attr" object is the one containing all the log details, and can be extremely versatile (can handle simple context as client name, IP etc...or much more complex logs like full pipeline outputs).
I'm using a simple JSON filter in my logstash pipeline config :, with a couple of simple rename for the sake of readability.

filter {
	json {
		source => "message"
	}
	mutate {
		rename => {"[t][$date]" => "timestamp"}
		rename => {"s" => "log_level"}
		rename => {"c" => "component"}
		rename => {"ctx" => "context"}
		rename => {"attr" => "log_data"}
		}
}

The JSON is properly parsed, and I can create my Index Patterns (they can contain up to 700 fields, due to the versatile nature of attr objetcs.

Here is an example :

attr.originatingCommand.$clusterTime.clusterTime.$timestamp.i	37
attr.originatingCommand.$clusterTime.clusterTime.$timestamp.t	1,660,896,607
attr.originatingCommand.$clusterTime.signature.hash.$binary.base64	kIB/B5Yt6HBDKu3aZQmYS4EoiFY=
attr.originatingCommand.$clusterTime.signature.hash.$binary.subType	0

Problem is that some "nested" json objet properties can have different types over time, depending of the log. I have seen this for nearly 10 fields.

Mostly, for properties that have string or number or boolean values, they are getting a JSON object.

In this case we have these kinds of errors :

failed to parse field [attr.XXX.YYYY] of type [text]...Preview of field's value : <<mostly  a JSON object>>

here is a clear example, for key attr.error, which is a string in 99% of events:

"failed to parse field [attr.error] of type [text] in document with id 'oUZLsIIBg454KxwnKxz4'. Preview of field's value: '{keyPattern={_id=1}, code=11000, keyValue={_id=xxxxx}, codeName=DuplicateKey, errmsg=E11000 duplicate key error collection: yyyyyyy index: _id_ dup key: { _id: \"xxxxxx\" }}'"

First, I could differentiate with the component and send to different indexes, but I won't be able to go further with that approach (and it's not desirable anyway).

I tried a lot of things, trying to force to convert to string (clearing the index each time) these fields, trying to test them.
The only way I found to avoid the errors is to force a replacement by null for each of the identified fields. This is OK maybe for some, but for "error" this is not.

I maybe missing something , maybe there is a better approach ?

Thanks for your hints

LaBonave · August 19, 2022, 3:12pm

I'm adding some details.
After a while, the 1000 fields per index has been exceeded for a single index.

"reason"=>"Limit of total fields [1000] in index

This means the JSON approach as I tried is not good.
Is there a way, at the filter (or filebeat maybe) not to parse recursively some JSON documents. For instance, attr.command : would be a single field (JSON, not parsed, ie avoiding to parse attr.command.arg1, attr.command.arg2) ?

Badger · August 19, 2022, 9:48pm

Not that I know of. However, you can serialize some of the objects.

ruby { code => 'event.set("[attr][command]", event.get("[attr][command]").to_s)' }

LaBonave · August 24, 2022, 2:30pm

Hi Badger,
this worked, I could isolate 5 or 6 objects that are way too versatile. Thanks a lot.
But I think this is not a proper way to handle as I lose a lot of intelligence and details
Maybe mongodb log structure is not suitable yet ?

system · September 21, 2022, 2:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing MongoDB with Logstash Logstash	1	541	November 9, 2020
Logstash: Parse Mongo Collection Name To Add As 'Type' in ElasticSearch Logstash	3	1430	October 23, 2017
Logstash - json log line has field that can be either string or another json object: seems to anger elasticsearch Logstash	2	834	January 24, 2018
Parsing nested json logs iin logstash Logstash	7	1062	July 7, 2017
Logstash 1.5.0 json filter output to elasticsearch fails on nested json Logstash	2	998	July 6, 2017

MongoDB (5+, json based) : some objects have variable types

Related topics