I'm doing a bulk upsert using the Elasticsearch NodeJS library Helper classes.
const response = await ELSClient.helpers.bulk({
concurrency: 20,
datasource: data,
onDocument (doc) {
return [{ update: { _index: index, _id: doc.id } },
{ doc_as_upsert: true }
]
}
})
if (response.failed > 0) {
console.log('FAILED', response.failed)
}
I'm upserting about 4.4 million records, 5000 per batch, but I'm only able to get 3.8 million records to be created. When I log the response I don't see any error messages stating that any documents failed.
When I initially ran the upsert only about 3 million records were created. The second time I ran it an additional 800k were created. The server is running at a pretty high CPU percentage, but isn't spiking over 100%, non of the CPU credits are being used.
The documents themselves are pretty small. Maybe 10 fields. These records are coming from a Postgres JOIN table so I know the IDs are unique.
I understand this is a very broad issue with lots of possible issue. I'm just looking for some suggestions to try and figure out how to debug this or what could be the possible issue.