Why is fast bulk than single indexing in elasticsearch

slowup · August 22, 2023, 8:22am

I wonder why bulk indexing is faster than single indexing.
I'm curious from the point of view of elasticsearch, other than being connected and closed and network communication problems.

Based on the default refresh time, if there is 1,000 single index per second and 1 bulk index (1,000 documents) per second, is the bulk faster from the point of view of elasticsearch? I'm curious about the details.

Christian_Dahlqvist · August 22, 2023, 8:30am

Unless the bulk request is targeting a very large number of shards, a bulk request will generally result in a number of documents being written to a shards transaction log at once. Every write is as far as I know synced to disk before a response is sent, so using bulk requests can reduce the number of IOPS and disk I/O. There may other benefits of handling documents in bulk, but this can have a big impact as disk I/O often is a limiting factor.

slowup · August 22, 2023, 10:14am

Writes are fsynced when a flush occurs.

I mean in the above example why is it good for performance when processing the same amount of documents within refresh time

I don't think the performance will be different when indexing 1,000 documents by 1 document in 1 second and 1 bulk (1,000 documents) in 1 second.

dadoonet · August 22, 2023, 10:17am

Also it requires much less HTTP requests/responses.

Christian_Dahlqvist · August 22, 2023, 10:37am

No, that is not correct. Documents are written and fsynced to the transaction log irrespective of when the flush to create a new segment and make the documents searchable happens. I would therefore expect a not insignificant difference in disk I/O sync calls.

slowup · August 23, 2023, 5:09am

@Christian_Dahlqvist

Is it correct that there is no difference in performance from the perspective of an Elasticsearch server between indexing a single document 1000 times at the pace of the refresh interval and processing a bulk request of 1000 documents at once?

slowup · August 23, 2023, 5:13am

The transaction log you are talking about is translog right?

Christian_Dahlqvist · August 23, 2023, 5:22am

Yes.

slowup · August 23, 2023, 6:34am

@Christian_Dahlqvist
So to recap, what you're saying is that bulk is better than single because translog has less io, since translog has io per request, right?

Christian_Dahlqvist · August 23, 2023, 6:36am

Yes, that is one aspect. There is also a difference between handling a single HTTP request for a bulk request of N documents vs N separate HTTP requests. There may be other factors also contributing.

emmning · August 23, 2023, 9:34am

If the index.translog.durability parameter is set to async, then this will have no effect, right?

Christian_Dahlqvist · August 23, 2023, 9:42am

That setting will remove the overhead for fsyncing individual requests at the cost of reduced durability and resilience. Other factors, e.g. overhead related to handling multiple HTTP requests, would not go away though, so I would still expect a performance difference.

system · September 20, 2023, 9:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk insert vs Single insert Elasticsearch	2	1411	July 6, 2017
Bulk index is so faster with single data node! Elasticsearch	2	510	January 4, 2019
Bulk indexing performance Elasticsearch	10	4444	February 10, 2017
Elasticsearch upgrade from 1.7 to 2.2 write speed is very slow Elasticsearch	6	1312	July 5, 2017
Indexing2 Elasticsearch	2	370	July 5, 2017

Why is fast bulk than single indexing in elasticsearch

Related topics