Bulk update failing erratically?

(Hunter Marshall) #1

I am trying to add a field to select events. I am using the elasticsearch-py client and the bulk api. Note that I am scanning and updating the same index. Is that a no-no?

I am using the scan helper:

for event in helpers.scan(es, index=input_index, q="evt_id:4624 OR evt_id:4634"):

I build up the actions into a list:

  '_op_type': 'update',
  '_type': es_type, 
  '_index': index_name, 
  '_id': event['_id'],
  'doc': {'logon_duration14 ': logon_time}

Then upload ...

  k = (action for action in actions)
  print helpers.bulk(es, k, chunk_size=100)

The output from the print on helpers.bulk() indicates the expected number of "successes" and no errors. The new field is successfully created in many of the events. The issue is that repeated runs of the same code produce a different set of "many" each time. Each time I rerun the code, I add a digit to the field name. Events that successfully received "logon_duration8" might not get a value for "logon_duration9". IF any event HAS a value for the added field ("logon_duration1", or "logon_duration2", etc), then all the different fields have the same common value.

Overall, it seems like some of the updates are sporadically failing?

Thx for any hints or help.


(Hunter Marshall) #2

This is not a bulk update failure. This is an operator error. :-/

Simple testing showed that the bulk update is working fine. What actions get assembled are successfully written into elasticsearch.

My processing logic was overlooking the non-ordered nature of data returned from the scan/scroll operation.

Sorry for the distraction.


(system) #3