Dec 22nd, 2025: [EN] Scaling Bulk Indexing with Elastic Cloud Serverless

The split-tier architecture design of Elastic Cloud Serverless (ECS) keeps indexing tasks separate from search. It means indexing and search scale separately based on user demands of the system. On the backend, there are dedicated node pools for each tier for managing compute resource demands accordingly. For the indexing tier, scale is expanded in a step pattern as indexing load (think indexing pressure, CPU & memory utilization, write queue saturation) increases, and shrunk as demand eases. For search, the tier grows on data volume ingested, and optionally by search load.

The great part is this scalable system does not require end user management. The platform is fully managed by Elastic. Here are some tips for optimizing indexing performance on the Serverless platform.

  1. New Serverless projects start with the least possible compute resources. It means compute resources are consumed only as needed.
  2. Aim to keep median bulk latency between 200 and 1000 milliseconds. Bulk requests to Serverless have a 200ms minimum response time due to scheduled object store flushes. These flushes are part of what ensures your data is properly persisted to object storage and made available for search.
  3. Tuning bulk size and workers is necessary for high ingest performance. Start by finding the response time of a single bulk request for a single client, and work up. Be sure to load any index templates prior to running experiments.
  4. If index size and scale are a concern, use data streams as opposed to standard indices. Data streams differ from standard indices in that they require the @timestamp field and are managed by data lifecycle policies. In serverless, data lifecycle just means assigning a data retention. Designed initially for high volume observability and security workloads, data streams are built for scale beyond that of standard indices taking full advantage of the scaling capabilities of Serverless.
  5. Fan out the client workload to trigger scaling. More requests by more clients will allow your projects to scale up. Expect backpressure in the form of 429s during scale ups from ground zero. Whether you are a 1TB or a 1PB per day user, the project will need to ramp up, and stabilize to your workload.
  6. Use Elasticearch Rally to benchmark ramp up and run time indexing throughput with a sample of your own data, or use one of the predefined workloads in the Rally tracks repo.

I put together a script for experimenting with bulk request sizes and clients. To use it, install Astral uv, set the marked configuration constants at the top of the script, then run: ./async_bulk.py.

The script will output some stats when it finishes:

Starting indexing with 24 processes...
Documents per process: 4,166,666
Workers per process: 75


==================================================
Bulk Indexing Statistics (Multi-Process)
==================================================
Number of processes:         24
Total documents indexed:     99,999,984
Total bulk requests:         200016
Elapsed time:                237.26 seconds
Indexing rate:               421,480.34 docs/sec
Max queue depth:             10,000 documents
Min bulk response time:      204.97 ms
Median bulk response time:   1285.85 ms
Max bulk response time:      4273.16 ms
==================================================

If using a data stream:

PUT /_index_template/advent_cal_ds_template
{
  "index_patterns": ["advent-ds*"],
  "data_stream": {},
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" }
      }
    }
  }
}

PUT /_data_stream/advent-ds

then target the advent-ds as the index name in the script. Enjoy!