Elasticsearch bulk index missing some records

Hi,
i am using es 5.6 with python client and doing bulk index with some frequent log files.
what i need to care when going to bulk index and should not miss records. pls help

Are you looking at the bulk response and checking that all documents were reported as successfully written?

we are not looking for response, how can we handle it so successfully index all records.
any parameter or anything we should care, please suggest
our python client syntax : es = Elasticsearch([{'host': '192.168.1.xxx', 'port': 7205, 'timeout': 60},{'host': '192.168.1.yyy', 'port': 7205, 'timeout': 60}])

You need to look at the bulk response and check that all records were successful. If some did fail, you need to handle those errors, e.g. by retrying them.

What is possibility If we do not get error in response for record and that record not indexed in ES?

I do not think that should happen. If you get an acknowledgement without error from Elasticsearch the document has been indexed, although it may not yet be searchable unless a refresh has run.

yes you are right, but we are processing a file having approx 500 records all doc indexed but few records missed, without any error, so what we should take care

Are these documents reported as successfully indexed in the bulk response? How are you checking which documents are missing? have you run a refresh or waited for one to occur before checking if they have been indexed? Are you allowing Elasticsearch to assign document IDs?

1 Like

we are having backup of processed files, in same file having 500 records few records are are not stored in es.

we are giving our ID, oursample doc
{"index": {"_index": "events", "_type": "cm", "_id": "1132052186974525_65_1"}}
{"DT": "2018-07-04T10:27:15", "RE": "1", "PT": "2018-07-05T09:13:26", "ZI": "1132052186974525", "RT": "65"}
{"index": {"_index": "events", "_type": "cm", "_id": "1132052186974525_65_2"}}
{"DT": "2018-07-04T10:27:16", "RE": "2", "PT": "2018-07-05T09:13:26", "ZI": "1132052186974525", "RT": "65"}

have you run a refresh or waited for one to occur before checking if they have been indexed?

Could you answer that?

Could you share the full output of the Bulk Response that your python job is getting?
If too big for this forum, upload as a gist.github.com and share the link here.

Are you sure your document IDs are unique?

yes doc is unique, default setting for refresh is applied which is false

Default refresh time is 1 second. Is that what you are using?

yes default is here

Do you have any non-default settings for Elasticsearch?

no we have use default setting

I have observed following issues in bulk indexing -

  • record that is included in bulk data post but not getting any response regarding that record
  • So my conclusion is that somewhere in ES the recording is missing

I am giving few logs of ES - DEBUG mode
_1. 2018-07-05 17:30:10,716 - root - DEBUG - File: /home/developer/p1_py/logs/eventlogs/event2.log.2_01 -- Response:{'errors': False, 'items': [ ..... {'index': {'_shards': {'total': 2, 'successful': 2, 'failed': 0}, 'created': True, '_index': 'campevents', '_version': 1, 'result': 'created', '_type': 'cmev', 'forced_refresh': True, 'status': 201, 'id': '3160379186122643_72_2'}}
_2018-07-05 17:30:01,674 - root - DEBUG - Event data {"index": {"_type": "cmev", "_index": "campevents", "id": "3160379186122643_72_2"}}:

_2. 2018-07-05 17:00:01,757 - root - DEBUG - Event data {"index": {"_index": "campevents", "_id": "1993761866142132_77_10", "type": "cmev"}}:

In above log we found that "_id": "1993761866142132_77_10" only appeared in Event data log but didn't appeared in Response:{'errors': False

Please review the above logs and give us your opinion
** we also change refresh interval 30 sec of that index and changed refresh : False, please suggest if missing anything so no chance to miss any data

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.