Bulk index efficiency advice

Kyle_Manuel · April 25, 2024, 3:51am

Client: net6.0 v8.13.10
Elastic: v8.13.0

Using .NET TPL I have managed to improve our bulk operation efficiency to the point where Elastic is now our bottleneck. We need to load 250+ million records and the indexing task is noticeably slower than the load from database and transform.

I have attempted to follow all best practices I could find for designing indexes, only indexing searchable fields, configuring elastic instance, etc.

Can/should I parallel bulk calls (currently queued and processed one at a time)?

Are there improvements to be made to the bulk call below?

try
 {
     var waiter = new CountdownEvent(1);
     var bulk = Client.BulkAll(batch, b => b //batch <= 1000 records
         .BackOffRetries(2)
         .Index(IndexName<T>())
         .BackOffTime(TimeSpan.FromSeconds(30))
         .RefreshOnCompleted(false)
         .MaxDegreeOfParallelism(2)
         .Size(batch.Count()), cancellationToken);
     bulk.Subscribe(new BulkAllObserver(
         onError: (e) => throw e,
         onCompleted: () => waiter.Signal()
         ));
     waiter.Wait(cancellationToken);
 }
 catch (Exception e)
 {
     _logger.LogError("Error in bulk indexing: {message}\r\n{stack}", e.Message, e.StackTrace);
     onError?.Invoke(e);
 }

Christian_Dahlqvist · April 25, 2024, 5:17am

Absolutely. If you are not sending bulk requests in parallel across multiple connections to Elasticsearch that lack of parallelism is likely your bottleneck. When I have run benchmarks I have always required multiple parallel indexing jobs to saturate Elasticsearch. Exacty how many parallel tasks are ideal and what the optimal bulk size is will depend on your cluster, data and sharding strategy as well as what other load the cluster is under, so this is something you need to test.

You can see this recommended in the official documentation.

Topic		Replies	Views
Alternative bulk indexing implementations? Elasticsearch	10	2285	July 5, 2017
Slow bulk indexing with lots of different 'types' Elasticsearch	7	795	July 5, 2017
Bulk Indexing Rate Elasticsearch	4	563	April 18, 2018
Indexing performance using transport client Elasticsearch	16	1412	July 31, 2017
Bulk indexing performance Elasticsearch	10	4451	February 10, 2017

Bulk index efficiency advice

Related topics