Subsequent bulk index fail after first index

Hi There,

ES: 6.6
elastic/elasticsearch-js: 6.8
GKE: 1.12.7-gke.24
nodejs: 10.16.0

I currently have a Kubernetes cluster on GCP. What I usually do is kubectl port-forward elasticsearch-client... 9200 and test my nodejs script from http://localhost:9200.

After switching from elasticsearch-js:16 to the newer elasticsearch-js:6 and a short hiatus, my scripts no longer work (bear with me... I beleive they did before the hiatus and after the version switch).

My script simply creates an array of alternating action and data values for a bulk request. I then chunk that array (it is to large for one bulk request) and iterate over each chunk, waiting for the previous to finish so as not to overwhelm the ES server. My first chunk successfully posts, but it then hangs on the second post resulting in an error.

const { logger, client } = await initialize()

// * Read in raw data
const attributes = JSON.parse(fs.readFileSync(path.resolve(__dirname, 'data.json')))

// * Format to match mapping and push onto action array
const actions = attributes.map(build)
  .reduce((actions, attribute) => {
    actions.push({ /* action */ }, item)
    return actions
  }, [])

// * Split 30000+ item array into 5000 item chunks
const actionChunks = chunk(actions, 5000)

// * Post each chunk
for (let i = 0; i < actionChunks.length; i++) {
  // * Post and wait for completion
  // ! It hangs here when i === 1
  const result = await client.bulk({ body: actionChunks[i] })

  // * Parse result for errors
  if (result.body.errors) {
    for (let i = 0; i < result.body.items.length; i++) {
      if (result.body.items[i].index.status > 299) {
        const error = result.body.items[i].index.error
        logger.error('error: %o', error)
      }
    }
  }

  // * Print progress
  logger.log(`Process chunk ${i + 1}/${actionChunks.length}`)
}

// * Print complete
logger.log('Complete')

My error is one of the following depending on if gzip compression was enabled on the elasticsearchjs client:

  • (Enabled) AssertionError [ERR_ASSERTION]: zlib binding closed
  • (Disabled) Timeout Error

So it would seem that ES is closing the connection for currently unknown reason.

Now I am trying to figure out how I can debug the situation and I am completely lost. GCP logs show nothing after the initial startup logs, es client outputs nothing but the thrown error. I am hoping I will be able to find some help here. Please let me know if you need to see anything else.

Thank you

Edit for more info

My topology is as follows:

  • 3x Master Nodes
  • 1x Data Node
  • 1x Client Node

My data node shows the following regularly, during both successful cronjobs from within GCP and the script above:

[2019-07-24T19:03:36,653][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-data-0] [gc][1787555] overhead, spent [410ms] collecting in the last [1.2s]

Is seems the first request, no matter the action, consumes the connection. For example adding

await client.ping()

after initialize() makes client.bulk hang at i === 0.

This had everything to do with the @elastic/elasticsearch Client sniff features. Enabling sniff finds all nodes in the cluster and removes the old ones that dont match. This means that my localhost connection was being removed on sniff and replaced with internal cluster IPs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.