Elastic search bulk request API - Is the 'order' of document writing guaranteed

Hi Team,

We are using bulk request API to write bulk documents into elastic search DB. As per the API documentation, we are adding documents one after another (i.e. in sequential order ) and after that we are calling either SYNC or ASYNC API to write (as mentioned below).

Is it guaranteed that client.bulk / client.bulkAsync will write the documents in the same order they have been added into either Bulk processor or into Bulk request object.

We are using below APIs.

BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);

client.bulkAsync(request, RequestOptions.DEFAULT, listener);

Please suggest.

Thanks
Atanu

The order is not guaranteed.

Your bulk indexing request is received by a coordinating node. That node must then, document by document, determine the correct shard routing for each document, and then forward the documents to the nodes where each relevant shard is allocated.

Once the documents are distributed among the nodes, there can be no guarantees from node to node what order the indexing operations will be completed in, as the nodes are then working in parallel at whatever rate their available resources allow.

1 Like

Thank you for your quick response.

To make sure that we understand your reply correctly, could you please clarify the below.

If there are two or more update/upsert requests for the same document id in a bulk request, are they updated/upserted into ES in the same order they have been added into either Bulk processor or into Bulk request object (during ASYNC or SYNC request respectively).

Consider the below example bulk request containing 6 requests.

Request 1 - POST - update/upsert - http://elasticsearch-ip:port/myindex/mytype/id1 {“firstname”:”Johan”, “lastname”:”andy”, “rank”:”5”}

Request 2 - POST - update/upsert - http://elasticsearch-ip:port/myindex/mytype/id2 {“firstname”:”bob”, “lastname”:”raly”, “rank”:”2”}

Request 3 - POST - update/upsert - http://elasticsearch-ip:port/myindex/mytype/id1 {“firstname”:”Johan”, “lastname”:”andy”, “rank”:”3”}

Request 4 - POST - update/upsert - http://elasticsearch-ip:port/myindex/mytype/id1 {“firstname”:”Johan”, “lastname”:”andy”, “rank”:”7”} – this should be the latest document in elastic db for document id (id1)

Request 5 - POST - update/upsert - http://elasticsearch-ip:port/myindex/mytype/id3 {“firstname”:”tony”, “lastname”:”greg”, “rank”:”5”}

Request 6 - POST - update/upsert - http://elasticsearch-ip:port/myindex/mytype/id4 {“firstname”:”paul”, “lastname”:”william”, “rank”:”5”}

As you can see, there are multiple requests for the document ID “id1”. i.e. Request 1, request 3 and request 4 are all for the same document ID “id1”.

In the above example, is it guaranteed that at the end of the bulk request completion, document content for the document ID ‘id1’ will be as per request 4. Or it is not guaranteed.

Thanks in advance.

You can rely on the fact that operations on the same document (same _index, _type and _id) will be in order. However you can't assume anything for documents that have different indices/types/ids.

Be aware that sending multiple updates to the same document in a bulk request can cause multiple refreshes and performance problems. It actually applies to frequent updates to the same document even if they are in different bulk requests.

1 Like

Thanks for the clarification.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.