Performance management in CCR

We currently have CCR setup between two deployments within Elastic Cloud.

The main reason for having this configuration is that we write heavily to the leader, and don't mind a search performance hit there, but would like to maintain search speed on the follower.

When the follower is initialised, the performance on that cluster holds up well. When we have a load of writes to the leader however, we see a massive spike in CPU usage on the follower.

There are the the default follow values you can configure, but I can't find a configuration that gives me this desired result? Any advice?

1 Like

Broadly speaking, a CCR follower does the same indexing work as the leader. Searches on the follower won't interfere with indexing on the leader but the converse isn't true.

Hey David - thanks for the response. This is exactly what we're seeing, but are trying to avoid.

How might we set it up so that the indexing that occurs to the leader is somehow throttled when beng passed onto the follower?

I imagine it might be some sort of setup with these values?

Write throttling isn't a specific thing you can do, although I guess there are various things you can do to de-tune things away from replication performance. I haven't tried any of the following ideas, I'm just thinking out loud here.

The most direct way to reduce CPU usage due to indexing seems to be to reduce the number of threads in the write threadpool. By default indexing will use all your CPUs but you could reduce the number of write threads to avoid that.

You could also try traffic shaping on the cross-cluster traffic, limiting the overall bandwidth available to replication. The trouble with that is bandwidth is not directly linked to CPU usage, and the relationship may change over time, so you'll need to experiment to find the right balance.

Reducing max_outstanding_read_requests and/or max_outstanding_write_requests would reduce the concurrency of replication, but only within each shard so different shards will still be indexing concurrently.

Given this is Elastic Cloud, have you raised this issue with our Support Team for further discussion?