Slow upserts


(Neera Vats) #1

Hello everyone,

I am using python helpers to update an elasticsearch index. If document already exists, update script appends to an array of json objects. It takes about 1 hours to update 10 million records. Are upserts is ES generally that slow? or I am dong something wrong? I would appreciate any insights on this. Thanks!

Here is a snippet of python script I am using:

action = {
"_op_type":'update',
"_id": ,
"_routing": ,
"upsert":{ "id" : ,
"val1" : ,
"val2" : ## nested objects in the schema
},
"script" :"my-script",
"params" : {
"new_val1" : ,
"new_val2" :
}
}

actions.append(action)

if len(actions) == 1000 :
     response = helpers.streaming_bulk(es, actions, index='index',
          doc_type='type', chunk_size=1000, request_timeout=400)

my-script : ctx._source.val2 += new_val2; ctx._source.val1 += new_val1


(Nik Everett) #2
  1. Try to use ``` to make your posted code easier to read.
  2. Try to post questions using cURL or Sense syntax. Any other client, supported or not, limits the audience who can read your post.
  3. Updates in Elasticsearch mark the old copy of the document as deleted and then index a new copy of the document. Scripted updates must fetch the document and apply the script as well. Indexes don't have to fetch the document. Indexes that don't replace an existing document don't have to mark it deleted, etc.
  4. It's hard to say where the time is going without some analysis, but 10 million documents and hour is 2,777 updates a second which seems like a fair clip to me, depending on the hardware and the documents.
  5. You can investigate setting ctx.op = "noop" as documented here which should prevent extra work when the script doesn't change the document.

(system) #3