Hello !
My database contains 500 000 documents in the same index and I need to update some fields of every document every week. I use scrapy to get the data to update for each document.
Instead of updating each document one by one, to increase efficiency, I would like to build a request which updates the first 2000 documents, then the 2000 documents after...
I do the same to create the document for the first time, using helpers.bulk(es, self.actions), with self.actions containing different queries :
self.actions=[
'_index': 'myindex',
'_type': 'mytype',
'_id': idItem,
'_source': {
'title': 'title1',
'views' : 100,
'likes' : 200,
},
'_index': 'myindex',
'_type': 'mytype',
'_id': idItem2,
'_source': {
'title': 'title2',
'views' : 150,
'likes' : 250,
},
....
]
I've read a lot of topics and similar questions but I can't find the answer : if I use 'op_type' : 'update' it doesn't keep the fields that I don't want to update... Furthermore if I use a script using the update API I can't update 2000 documents at the same time...
Do you have a solution (in Python) ?
Thanks a lot !