Bulk Upsert isn't Importing All Records

I'm doing a bulk upsert using the Elasticsearch NodeJS library Helper classes.

const response = await ELSClient.helpers.bulk({
      concurrency: 20,
      datasource: data,
      onDocument (doc) {
        return [{ update: { _index: index, _id: doc.id } },
          { doc_as_upsert: true }
        ]
      }
    })
    if (response.failed > 0) {
      console.log('FAILED', response.failed)
    }

I'm upserting about 4.4 million records, 5000 per batch, but I'm only able to get 3.8 million records to be created. When I log the response I don't see any error messages stating that any documents failed.

When I initially ran the upsert only about 3 million records were created. The second time I ran it an additional 800k were created. The server is running at a pretty high CPU percentage, but isn't spiking over 100%, non of the CPU credits are being used.

The documents themselves are pretty small. Maybe 10 fields. These records are coming from a Postgres JOIN table so I know the IDs are unique.

I understand this is a very broad issue with lots of possible issue. I'm just looking for some suggestions to try and figure out how to debug this or what could be the possible issue.

Hi,

Have you tried with a value less than 5k? For example, half.

We’ll I initially tried it with 10k. So 5k is already half.

In the end this was an issue with my Postgres Query and had nothing to do with Elasticsearch. I was not ordering my query uniquely enough and thus it was not able to get through all the items.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.