MongoDB (5+, json based) : some objects have variable types

Hi
(new here)
I've successfully managed to send all of our mongodb 5 instances (JSON-based) logs, via filebeat, on a Logstash instance.
MongoDB 5+ are JSON-based logs (https://www.mongodb.com/docs/manual/reference/log-messages/). The "attr" object is the one containing all the log details, and can be extremely versatile (can handle simple context as client name, IP etc...or much more complex logs like full pipeline outputs).
I'm using a simple JSON filter in my logstash pipeline config :, with a couple of simple rename for the sake of readability.

filter {
	json {
		source => "message"
	}
	mutate {
		rename => {"[t][$date]" => "timestamp"}
		rename => {"s" => "log_level"}
		rename => {"c" => "component"}
		rename => {"ctx" => "context"}
		rename => {"attr" => "log_data"}
		}
}

The JSON is properly parsed, and I can create my Index Patterns (they can contain up to 700 fields, due to the versatile nature of attr objetcs.

Here is an example :

attr.originatingCommand.$clusterTime.clusterTime.$timestamp.i	37
attr.originatingCommand.$clusterTime.clusterTime.$timestamp.t	1,660,896,607
attr.originatingCommand.$clusterTime.signature.hash.$binary.base64	kIB/B5Yt6HBDKu3aZQmYS4EoiFY=
attr.originatingCommand.$clusterTime.signature.hash.$binary.subType	0

Problem is that some "nested" json objet properties can have different types over time, depending of the log. I have seen this for nearly 10 fields.

Mostly, for properties that have string or number or boolean values, they are getting a JSON object.

In this case we have these kinds of errors :

failed to parse field [attr.XXX.YYYY] of type [text]...Preview of field's value : <<mostly  a JSON object>>

here is a clear example, for key attr.error, which is a string in 99% of events:

"failed to parse field [attr.error] of type [text] in document with id 'oUZLsIIBg454KxwnKxz4'. Preview of field's value: '{keyPattern={_id=1}, code=11000, keyValue={_id=xxxxx}, codeName=DuplicateKey, errmsg=E11000 duplicate key error collection: yyyyyyy index: _id_ dup key: { _id: \"xxxxxx\" }}'"

First, I could differentiate with the component and send to different indexes, but I won't be able to go further with that approach (and it's not desirable anyway).

I tried a lot of things, trying to force to convert to string (clearing the index each time) these fields, trying to test them.
The only way I found to avoid the errors is to force a replacement by null for each of the identified fields. This is OK maybe for some, but for "error" this is not.

I maybe missing something , maybe there is a better approach ?

Thanks for your hints :slight_smile:

I'm adding some details.
After a while, the 1000 fields per index has been exceeded for a single index.

"reason"=>"Limit of total fields [1000] in index

This means the JSON approach as I tried is not good.
Is there a way, at the filter (or filebeat maybe) not to parse recursively some JSON documents. For instance, attr.command : would be a single field (JSON, not parsed, ie avoiding to parse attr.command.arg1, attr.command.arg2) ?

Not that I know of. However, you can serialize some of the objects.

ruby { code => 'event.set("[attr][command]", event.get("[attr][command]").to_s)' }
1 Like

Hi Badger,
this worked, I could isolate 5 or 6 objects that are way too versatile. Thanks a lot.
But I think this is not a proper way to handle as I lose a lot of intelligence and details :frowning:
Maybe mongodb log structure is not suitable yet ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.