Metricbeat mongodb replstatus metricset causing high cpu usage

Hi All,

We recently upgraded to metricbeat 8.14.3 so we could support our mongodb 5.0 upgrade. Since the upgrade, we noticed high cpu usage.
Traced it to the mongodb replstatus metricset. It is running this expensive query on the oplog that is taking 8s, runs every 10s. Query is running a COLLSCAN on oplog.rs

{
  "$group": {
    "minTS": {
      "$min": "$ts"
    },
    "_id": 1,
    "maxTS": {
      "$max": "$ts"
    }
  }
}

Our oplog.rs collection is about 3gb. Not super huge but we do have much larger oplogs in other environments. Couldn't the same information be gathered by calling a db.getReplicationInfo() function call without the COLLSCAN?

I noticed this was a change added to fix this problem

([MongoDB] replstatus large temp files and sort function issues · Issue #8683 · elastic/integrations · GitHub)

I have reduced the frequency of the mongodb metrics for now and having that run on the secondary as a work around for now.

Thanks,
Jay

Looking at this more, the query is being called by the getOpTimestamp(collection *mongo.Collection) function here:

https://github.com/elastic/beats/blob/main/metricbeat/module/mongodb/replstatus/info.go

I wish I knew golang I would help with a pr,
but these are the queries I would run on the oplog.rs collection to get the newest and oldest timestamps without an expensive collection scan.

db.oplog.rs.find({op:"i"},{"ts":1, "_id":0}).sort({$natural: 1}).limit(1);
db.oplog.rs.find({op:"i"},{"ts":1, "_id":0}).sort({$natural: -1}).limit(1);

I would greatly appreciate it if somebody would be willing to give me a hand.

Update:

Looks like the original code was closer, the -$natural was the original problem, it caused excessive temp file usage. If the getOpTimestamp function could be modified to accept a "sort_order" variable, I think it may be an easy fix.

// get first and last items in the oplog
	firstTs, err := getOpTimestamp(collection, "$natural")
	if err != nil {
		return nil, fmt.Errorf("could not get first operation timestamp in op log: %w", err)
	}

	lastTs, err := getOpTimestamp(collection, "-$natural")
	if err != nil {
		return nil, fmt.Errorf("could not get last operation timestamp in op log: %w", err)
	}
func getOpTimestamp(collection *mongo.Collection, sort string) (uint32, error) {
	opt := options.Find().SetSort(bson.D{{Key: sort, Value: 1}})

https://github.com/elastic/beats/blob/49ae09a5614b79696fd02e752bee56492a9ca72d/metricbeat/module/mongodb/replstatus/info.go#L73

The oplog is a special capped collection in mongo
According to the mongo docs, natural order is the correct way to get the first and last record