Client: net6.0 v8.13.10
Elastic: v8.13.0
Using .NET TPL I have managed to improve our bulk operation efficiency to the point where Elastic is now our bottleneck. We need to load 250+ million records and the indexing task is noticeably slower than the load from database and transform.
I have attempted to follow all best practices I could find for designing indexes, only indexing searchable fields, configuring elastic instance, etc.
Can/should I parallel bulk calls (currently queued and processed one at a time)?
Are there improvements to be made to the bulk call below?
try
{
var waiter = new CountdownEvent(1);
var bulk = Client.BulkAll(batch, b => b //batch <= 1000 records
.BackOffRetries(2)
.Index(IndexName<T>())
.BackOffTime(TimeSpan.FromSeconds(30))
.RefreshOnCompleted(false)
.MaxDegreeOfParallelism(2)
.Size(batch.Count()), cancellationToken);
bulk.Subscribe(new BulkAllObserver(
onError: (e) => throw e,
onCompleted: () => waiter.Signal()
));
waiter.Wait(cancellationToken);
}
catch (Exception e)
{
_logger.LogError("Error in bulk indexing: {message}\r\n{stack}", e.Message, e.StackTrace);
onError?.Invoke(e);
}