Bulk index is so faster with single data node!

behi_1370 · December 1, 2018, 11:41pm

Hi

I am trying to index some docs in my Elasticsearch cluster with bulk API and I face something strange.
indexing 19k docs with one bulk API request on a cluster containing 3 master nodes and a single data node, takes about 1s, but when I add another data node with the same hardware spec, indexing time grows to 15-20s.

my index setting has 5 shards and 1 replica.
reduce replica to 0 and refresh_interval to -1 doesn't help.
I figured out that if all of my index shards be on the same data node, indexing time is about 1s but when some primary shards reallocate to another node. indexing time goes to 15-20s.

when I use parallel bulk in multiple data node environment, indexing time reduced to 2-4s, but I get confused why adding more data nodes to the cluster will increase indexing time?

this is a test environment and I don't have any special setting on this cluster.
my cluster version is 6.5.1 and data nodes have 64g ram, 31g heap, and 16core.

Thanks if anyone can help me in this situation

s1monw · December 7, 2018, 3:30pm

something must be up with you setup. either some replicas are not ready or you are starting you test to early. There is for sure some overhead to having replicas but it should be not much. you can look at our nightly benchmarks to get an idea https://elasticsearch-benchmarks.elastic.co/index.html#tracks/http-logs/nightly/30d

system · January 4, 2019, 3:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slowly Indexing speed Elasticsearch	26	861	August 18, 2020
Can bulk-index speed be faster in cluster-env than in single machine? Elasticsearch	1	443	July 6, 2017
ElasticSearch bulk api performance Elasticsearch	7	2058	July 6, 2017
Indexing slow down when we increase data node Elasticsearch	4	771	January 18, 2017
Scaling ES Cluster and balacing shards (primary, replica) Elasticsearch	1	604	July 5, 2017

Bulk index is so faster with single data node!

Related topics