Index Settings for Optimizing Indexing Throughput

In our logging clusters we're indexing at around 40k msg/sec and 500gb per index per day. The storage backend is ceph and sometimes it struggles as load increases over time etc.

Consequently, I'm toying wondering what I might do to optimize indexing in a way that accounts for sub-optimal storage. One of those things I'm considering is to double the size of index.translog.flush_threshold_size. Is this considered a good move when looking to maximise throughput? We're already have index.refresh_interval set to 30s so I figured making this change could align reasonably well with that...


I'm not sure it'll make much difference, but you can try.

Looking at the manual page on tuning for indexing speed the advice to avoid network-attached storage stands out. Could you move to a hot/warm architecture? This would allow you to index to fast local disks and then move today's indices into your Ceph cluster once indexing is complete.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.