Elastic IoT - Facing Ingest Speed "Light Barrier" at 10K docs/sec

We´ve set up a bare-metal elastic cluster with high CPU, max ram, and fast local flash storage in hot trier and followed all official preflight checklist and write optimization guides:

  • 3 Dedicated Master Nodes,
  • 2 Dedicated Coordinating Nodes (incoming data, no ingest pipelines),
  • 2 Dedicated Coordinating Nodes (query data over kibana),
  • 10 Dedicated Data Nodes with time-based indices (no direct access) (2 hot / 4 warm / 4 cold)

While we had a good start, with more and more data we´re now facing a "light barrier" for ingesting speed. Input sources are microservices using bulk API and varying service count and bulk size had no significant effect. We checked the IOPS, Network, CPU, RAM and we don´t see anything indicating a bottleneck on the master or coordinating or hot data nodes.
The average document size is less than 100 fields, without full-text fields but a lot of number or keyword fields with a limited variance due to the devices in the field.
Disabling replicas gave us just a minimal improvement at high resilience costs. Increasing flush_interval higher than 60sec is without effect. The shard size is between 35 and 50GB. The number of primary shards is 2 in order to distribute the workload. Adding 2 more hot nodes with increasing to 4 primaries gave us a minimal effect. Evaluating smaller or larger index intervals (hourly, weekly) with accordingly resized primary count showed no benefits. No hot threads and empty thread pools, segments are fine, GC too, ...

And even a minimal test setup (a single node, one index, no reply) shows the very same barrier at 10K docs/sec.

Now we´re stuck at a kind of "light barrier" and probably need to tweak more advanced settings. Is there any checklist or documentation or how-to´s out there? Any hint what to look into?

Any tips on what to try next.

Which version of Elasticsearch are you using? What is the specification of the hosts used at the different tiers? What is the average document size? How many indices and shards are you actively indexing into? How many clients do you have indexing into the cluster?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.