Bulk update caused DocumentMissing after Bulk Insert

I am working on a tool which sync change from MySQL to ES. When I am testing the tool, I find some thing that can't understand. Then following is some log:

# bulk insert about 100 docs
2019-04-14 11:16:26,049 index {[test_0][news][36], source[...]}, index {..}
# bulk update 100 docs by id
2019-04-14 11:16:26,428 update {[test_0][news][36], doc[index {[null][null][null], source[{...}], detect_noop[true]}
# some docs update report DocumentMissing
item.getFailure().getCause() instanceof DocumentMissingException

So any reason why this happens? (I am using ES 5.6)

Any one here?

Can you share the details of the API calls that your tool is making and the JSON responses?

the request API:

  BulkRequestBuilder bulkRequest = client.prepareBulk();
  for (SyncWrapper<WriteRequest> requestWrapper : aim) {
    WriteRequest request = requestWrapper.getData();
    if (request instanceof IndexRequest) {
      bulkRequest.add((IndexRequest) request);
    } else if (request instanceof UpdateRequest) {
      bulkRequest.add(((UpdateRequest) request));
    } else if (request instanceof DeleteRequest) {
      bulkRequest.add(((DeleteRequest) request));
    }
  }
  ListenableActionFuture<BulkResponse> future = bulkRequest.execute();

the response API:

  BulkResponse bulkResponse = future.get();
  if (bulkResponse.hasFailures()) {
    BulkItemResponse[] items = bulkResponse.getItems();
    for (item in items) {
      if (item.getFailure().getCause() instanceof DocumentMissingException) {}
    }
 }

I tried but failed to reproduce this on 5.6.16.

I made index.json containing 100 requests:

{"index":{"_type":"news","_id":"1","_index":"test_0"}}
{"foo":"bar"}
{"index":{"_type":"news","_id":"2","_index":"test_0"}}
{"foo":"bar"}
...
{"index":{"_type":"news","_id":"100","_index":"test_0"}}
{"foo":"bar"}

I made update.json also containing 100 requests:

{"update":{"_type":"news","_id":"1","_index":"test_0"}}
{"doc":{"foo":"baz"}}
{"update":{"_type":"news","_id":"2","_index":"test_0"}}
{"doc":{"foo":"baz"}}
...
{"update":{"_type":"news","_id":"100","_index":"test_0"}}
{"doc":{"foo":"baz"}}

I created a new index and immediately ran these bulk requests and saw no errors:

$ curl -XDELETE 'http://localhost:9200/test_0?pretty'; curl -XPUT 'http://localhost:9200/test_0?pretty' -H 'Content-type: application/json' --data-binary $'{"settings":{"number_of_shards":2,"number_of_replicas":1}}'; curl 'http://localhost:9200/_bulk?pretty&filter_path=errors' -H 'Content-type: application/x-ndjson' --data-binary @index.json; curl 'http://localhost:9200/_bulk?pretty&filter_path=errors' -H 'Content-type: application/x-ndjson' --data-binary @update.json
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test_0"
}
{
  "errors" : false
}
{
  "errors" : false
}

Can you describe how to reproduce the problem you're seeing like this?

I think I found the error, the index request the update request is made by two thread, so even though index request is a little bit faster, we can't ensure the order between those two batch. Thanks for your help.

2 Likes

Ah yes that'd explain it. You have to wait for the response from the indexing request before doing the update, otherwise it's not really "after".

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.