ElasticSearch Bulk Indexing Azure VM

irtzasuhail · September 24, 2020, 3:11pm

We are using a VM in Azure to run ElasticSearch to do some bulk indexing on our data. The biggest input JSON we've tested with so far is around 70 MB in size, and each time it fails with the node error 134 which translates locally to an Out of Memory Error. We tried increasing the heap size to 5 GB but to no avail. Any advice on how we can proceed?

 const { body: bulkResponse } = await client.bulk({
        refresh: true,
        body,
    });

This is the segment that fails each time. Is there any way that we can break down this larger JSON and iteratively bulk index those smaller files and collate the results at the end?

Update:

Found Bulk Helpers to break down larger documents, but it ends the call itself, rather than returning the index for processing further along.

    const result = await client.helpers.bulk({
        datasource: body,
        onDocument (doc) {
          return {
            index: { _index: 'my-index' },
          }
        },
        onDrop (doc) {
            console.log(doc)
          },
          retries: 3,
          refreshOnCompletion: true
      })

The helper sometimes returns this warning

 Warning: Unexpected call to 'log' on the context object after function execution has completed. Please check for asynchronous calls that are not awaited or calls to 'done' made before function execution completes.

warkolm · September 28, 2020, 3:42am

What do the Elasticsearch logs show at this time?

irtzasuhail · October 12, 2020, 3:42pm

Hi,

Really sorry for the late response. I got side tracked onto other things.

Im testing it right now, and it times out at 30 minutes without sending back any response.

Elastic Logs show the following

{"type": "server", "timestamp": "2020-10-12T14:50:01,079Z", "level": "INFO", "component": "o.e.c.m.MetadataMappingService", "cluster.name": "talos-elasticsearch", "node.name": "talosnode-01", "message": "[words/xUmNWjA3RUahEqfrY1NCZQ] create_mapping [_doc]", "cluster.uuid": "t7Iu5owsRjGp0VoDKudeSA", "node.id": "ApGtWNeiQZiEKJcwbQ4ejA"  }`

[2020-10-12T14:50:00,825][INFO ][o.e.c.m.MetadataCreateIndexService] [talosnode-01] [words] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]
[2020-10-12T14:50:01,079][INFO ][o.e.c.m.MetadataMappingService] [talosnode-01] [words/xUmNWjA3RUahEqfrY1NCZQ] create_mapping [_doc]


Node stats says

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },

system · November 9, 2020, 3:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Errors while doing bulk update, Am I doing this wrong? Elasticsearch	10	1061	July 5, 2017
Import large number of json documents failing via bulk import with no error message Elasticsearch	4	3716	November 25, 2018
Dec 9th, 2025: [EN] Use the Node.js Elasticsearch client to index large CSV files Advent Calendar	0	33	December 9, 2025
Bulkindex in elasticsearch(facing exception heap size out of memory) Elasticsearch	1	467	October 18, 2018
[elastic/elasticsearch] Cannot bulk index a JSON file greater than 100MB in Elasticsearch. Tried changing HTTP content length but it doesn't work Elasticsearch	14	2263	September 18, 2018

ElasticSearch Bulk Indexing Azure VM

Related topics