Bulk API Insert Data missing

Hi Team,

We are using Elastic 7 for our application with Java base code and elasticsearch-rest-high-level-client with 7.13.3 version for bulk insert using restHighLevelClient.bulk(createIndexRequest(contacts), RequestOptions.DEFAULT).

We are ingesting our document into Elastic in bulk of 500 with restHighLevelClient bulk api. we have noticed that we get the response 200 OK response from elastic and still some of document are missing in elastic from the same bulk request. Below response we got from the elastic and it shows successful still that document is not searchable. We are not sure why this is happening. However, when we try to re-sync the same data without any update after some time, it get synced and available in elastic.

Can you guys please help us what would be wrong ?

Response from elastic still which shows successful still not available in elastic.

[index=contacts_2021_march,type=_doc,id=0000000000000012345,version=3,result=updated,seqNo=536527,primaryTerm=7,shards={"total":3,"successful":3,"failed":0}]

Do you refresh the index before searching?

Are you doing that in an integration test?

1 Like

yes i have refresh index with "refresh_interval" : "2s".
No, I am not doing integration test. we have this issue with our PROD env and not able to re-create it in lower env. I have done multiple time ingestion of data but not able to re-produce the scenario.

We have a nightly job which will push the data to elastic in bulk of 500 every 30 seconds. every day we have almost 30-40k data which is ingested into elastic and most of data has 2-3 fields update. when this update happens, we figure out that our is missing even it was searchable previously. So we are ending up missing data in elastic. But strange thing is if we re-ingest same data which is missing without any update in document, it gets ingested and it is searchable.

We didn't see any reject of the data and we are getting 200 ok response with successful message for shards. Still those data is not searchable.

here is how code we use for Add/Update the data in elastic.

BulkResponse bulkResponse = restHighLevelClient.bulk(createIndexRequest(contacts), RequestOptions.DEFAULT);


private BulkRequest createIndexRequest(List<Contact> contacts) {
        BulkRequest bulkRequest = new BulkRequest();
        for (Contact contact : contacts) {

            IndexRequest indexRequest = new IndexRequest(ElasticConstant.INDEX_CONTACTS);
            indexRequest.id(StringUtils.leftPad(contact.getContactId(), ElasticConstant.CONTACT_LENGTH, ElasticConstant.PAD_CHAR_ZERO));
            indexRequest.source(objectMapper.convertValue(contact, Map.class));
            bulkRequest.add(indexRequest);
        }
        return bulkRequest;
    }

If all the data is injected correctly and the response shows something like "error": false, that the index is refreshed, then the only thing I can think about is an _id conflict. Like you are using the same _id for multiple documents so the latest doc overwrites the previous one...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.