ElasticSearch Bulk Indexing Azure VM

We are using a VM in Azure to run ElasticSearch to do some bulk indexing on our data. The biggest input JSON we've tested with so far is around 70 MB in size, and each time it fails with the node error 134 which translates locally to an Out of Memory Error. We tried increasing the heap size to 5 GB but to no avail. Any advice on how we can proceed?

 const { body: bulkResponse } = await client.bulk({
        refresh: true,
        body,
    });

This is the segment that fails each time. Is there any way that we can break down this larger JSON and iteratively bulk index those smaller files and collate the results at the end?

Update:

Found Bulk Helpers to break down larger documents, but it ends the call itself, rather than returning the index for processing further along.

    const result = await client.helpers.bulk({
        datasource: body,
        onDocument (doc) {
          return {
            index: { _index: 'my-index' },
          }
        },
        onDrop (doc) {
            console.log(doc)
          },
          retries: 3,
          refreshOnCompletion: true
      })

The helper sometimes returns this warning

 Warning: Unexpected call to 'log' on the context object after function execution has completed. Please check for asynchronous calls that are not awaited or calls to 'done' made before function execution completes.

What do the Elasticsearch logs show at this time?

Hi,

Really sorry for the late response. I got side tracked onto other things.

Im testing it right now, and it times out at 30 minutes without sending back any response.

Elastic Logs show the following

{"type": "server", "timestamp": "2020-10-12T14:50:01,079Z", "level": "INFO", "component": "o.e.c.m.MetadataMappingService", "cluster.name": "talos-elasticsearch", "node.name": "talosnode-01", "message": "[words/xUmNWjA3RUahEqfrY1NCZQ] create_mapping [_doc]", "cluster.uuid": "t7Iu5owsRjGp0VoDKudeSA", "node.id": "ApGtWNeiQZiEKJcwbQ4ejA"  }`
[2020-10-12T14:50:00,825][INFO ][o.e.c.m.MetadataCreateIndexService] [talosnode-01] [words] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]
[2020-10-12T14:50:01,079][INFO ][o.e.c.m.MetadataMappingService] [talosnode-01] [words/xUmNWjA3RUahEqfrY1NCZQ] create_mapping [_doc]

Node stats says 

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.