I wonder why bulk indexing is faster than single indexing.
I'm curious from the point of view of elasticsearch, other than being connected and closed and network communication problems.
Based on the default refresh time, if there is 1,000 single index per second and 1 bulk index (1,000 documents) per second, is the bulk faster from the point of view of elasticsearch? I'm curious about the details.
Unless the bulk request is targeting a very large number of shards, a bulk request will generally result in a number of documents being written to a shards transaction log at once. Every write is as far as I know synced to disk before a response is sent, so using bulk requests can reduce the number of IOPS and disk I/O. There may other benefits of handling documents in bulk, but this can have a big impact as disk I/O often is a limiting factor.
No, that is not correct. Documents are written and fsynced to the transaction log irrespective of when the flush to create a new segment and make the documents searchable happens. I would therefore expect a not insignificant difference in disk I/O sync calls.
Is it correct that there is no difference in performance from the perspective of an Elasticsearch server between indexing a single document 1000 times at the pace of the refresh interval and processing a bulk request of 1000 documents at once?
@Christian_Dahlqvist
So to recap, what you're saying is that bulk is better than single because translog has less io, since translog has io per request, right?
Yes, that is one aspect. There is also a difference between handling a single HTTP request for a bulk request of N documents vs N separate HTTP requests. There may be other factors also contributing.
That setting will remove the overhead for fsyncing individual requests at the cost of reduced durability and resilience. Other factors, e.g. overhead related to handling multiple HTTP requests, would not go away though, so I would still expect a performance difference.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.