Batched script upserts too verbose


(bryan rasmussen) #1

I have a document type product that is not very verbose - the following is a normal example document

_index": "products2",
"_type": "product",
"_id": "zanox95879211689b6d8a6e6afea5c3e4b92f5fZalando",
"_score": 4.5971155,
"_source": {
"doc": null,
"feedname": "Zalando",
"hash": 1302167889,
"retailerId": "879211689b6d8a6e6afea5c3e4b92f5f",
"global": true,
"uploadedImage": false,
"image": "http://i2.ztat.net/large/CL/11/1I/00/IQ/11/CL111I00I-Q11@12.jpg",
"updatefields": [],
"updatemade": false,
"retailer": "zalando",
"updateDate": "2016-01-05T14:42:51.855Z",
"combined": "",
"currency": "DKK",
"searchid": "zanox_879211689b6d8a6e6afea5c3e4b92f5f",
"category": "Damer / Sko / Sandaler / Sandaler med rem",
"price": "499.00",
"stock": "null",
"updatenumber": 1,
"elink": "http://ad.zanox.com/ppc/?36399442C827456676&ULP=[[clarks-sandaler-black-cl111i00i-q11.html?wmc=AFF45_ZX_DK_CAM02.[Partner_ID]..&opc=2211]]",
"description": "Clarks RISI HOP Sandaler black Sko på Zalando.dk | Udvendigt materiale: læder, For: imiteret læder, Sål: kunststof, Sål: imiteret læder | Sko nu gratis levering",
"name": "Clarks RISI HOP Sandaler black",
"brand": "Clarks",
"published": true,
"createDate": "2016-01-05T14:42:51.855Z",
"disabled": false,
"deliveryCost": ""
}
}

I am using a hosted service.

I can send about 400 documents of this type off in my batch (doing an upsert because I don't know before hand if the document exists) every 2.1 seconds and not stress my index and not get any dropped requests ( maybe I can go up a slight bit in size or down a bit in time but I don't think it is very negotiable - also some documents might have relatively big descriptions so the size can still vary a bit)

If I do a scripted upsert I have to have a params field with the params I will be updating (which is most of the fields), a script that will be verbose for updating those fields, and then an upsert document. This means I am basically making the requests I am sending 3 times the size as before, meaning I cannot handle 400 documents at a time.

Soon we will have 300,000 documents and I should expect doing scripted updates to take approximately 2 hours ( rounding up ) and decreasing other index operations performance during this time , we are reaching for 1 million documents in this year, 10 million in the year after. At more than 1 million I guess I can do some index management but 1 million will probably still be in a single index.

At any rate I would like decrease the amount of time it takes to do index, one thing would be if there was a way to make the scripted update less verbose - for example if instead of having to use a params object if the script could just take the values it needs from the upsert document? Probably that is not supported, is there any other methods that people use to decrease indexing time for scripted upserts?


(system) #2