We are using a VM in Azure to run ElasticSearch to do some bulk indexing on our data. The biggest input JSON we've tested with so far is around 70 MB in size, and each time it fails with the node error 134 which translates locally to an Out of Memory Error. We tried increasing the heap size to 5 GB but to no avail. Any advice on how we can proceed?
const { body: bulkResponse } = await client.bulk({
refresh: true,
body,
});
This is the segment that fails each time. Is there any way that we can break down this larger JSON and iteratively bulk index those smaller files and collate the results at the end?
Update:
Found Bulk Helpers to break down larger documents, but it ends the call itself, rather than returning the index for processing further along.
const result = await client.helpers.bulk({
datasource: body,
onDocument (doc) {
return {
index: { _index: 'my-index' },
}
},
onDrop (doc) {
console.log(doc)
},
retries: 3,
refreshOnCompletion: true
})
The helper sometimes returns this warning
Warning: Unexpected call to 'log' on the context object after function execution has completed. Please check for asynchronous calls that are not awaited or calls to 'done' made before function execution completes.