Avoid Duplicate document

Hi,
I am using bulk operation to create document. The issue is its duplicate document.

 actions = [{
                "_index": index_name,
                "_type": doc_type,
                "_source": doc
            } for doc in json_list]
            try:
                helpers.bulk(esClient, actions)
            except Exception as ex:
                print("Unable to do bulk create {0}".format(index_name))

json_list1 = [ {'created_on' : 'date1', 'hostaname': 'abcd', 'msg': 'abcdef'}, {'created_on' : 'date2', 'hostaname': 'abcdef', 'msg': 'abcdeeff'}]

json_list2 = [{'created_on' : 'date1', 'hostaname': 'abcd', 'msg': 'abcdef'}]

When I use json_list2, it create new document, total - 3, whereas Thr should be only 2 document.

In your code "_id" is missing. This should point to one of the field in your document which identify a doc as unique. The final code be like below:

actions = [{
                "_index": index_name,
                "_type": doc_type,
                "_source": doc,
                "_id": <A field from your doc>
            } for doc in json_list]
            try:
                helpers.bulk(esClient, actions)
            except Exception as ex:
                print("Unable to do bulk create {0}".format(index_name))

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.