Slow ES ingestion using Dataflow template with partialUpdates

We are using this template to ingest data once a day from BigQuery to Elastic Search.

It creates a dataflow job using the following relevant parameters:

    "usePartialUpdate": "true",
    "batchSizeBytes": "5242880",
    "bulkInsertMethod": "INDEX",
    "maxNumWorkers": "30",
    "workerMachineType": "n1-standard-1"

Total Index size: 70 million rows.
Job updates daily: 7 million rows.
Index refreshes every 30 minutes.
Index snapshotting happens once a day, outside the ingestion time.

By only changing the parameter usePartialUpdate from False to True, we see a drop from writing around 7,000 records/second to 1,500 records/second.

How come sending an update to on one field from a record is slower than sending the entire record to overwrite?

Elastic cluster size: 180 GB storage | 4 GB RAM | Up to 8 vCPU - Single Zone. 
Elastic version: v8.4.3
Elastic is managed through GCP marketplace.

BigQuery, Dataflow and Elastic are all in the same europe-west1 GCP region.

I found a few references from a few years ago which I hope they were fixed by now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.