I am using multisearch (msearch from elasticsearch-py) to search for all documents using a list of document ids from a list of indices. The id's are explicitly assigned. The goal is to look for existing documents (using a list of ids) and update documents that already exist and create a new index if the document does not exist. Since there could be more than 10K ids to search for, I use this piece of code for the search.
results = []
chunks = [list_with_ids[x:x+10000] for x in range(0, len(list_with_ids), 10000)]
for chunk in chunks:
if len(chunk)>0:
request = []
for _, index in enumerate(list_of_indices):
req_head = {'index': index}
req_body = {
"size":10000,
"query": {
"ids": {
"values": chunk
}
},
}
request.extend([req_head, req_body])
try:
result = client.msearch(body=request)
except:
continue
for response in result['responses']:
results.append(response)
The query seems to run fine for the most part but, sometimes it seems like the query does not return all the documents that match the query -- this makes it as if the document with some ids do not already exist on elasticsearch and the remaining part of my code (not shown here) creates a new document for it, thus creating documents. How can we ensure that all matching documents are returned?