Bulk index is so faster with single data node!


(Behnam Loghmani) #1

Hi

I am trying to index some docs in my Elasticsearch cluster with bulk API and I face something strange.
indexing 19k docs with one bulk API request on a cluster containing 3 master nodes and a single data node, takes about 1s, but when I add another data node with the same hardware spec, indexing time grows to 15-20s.

my index setting has 5 shards and 1 replica.
reduce replica to 0 and refresh_interval to -1 doesn't help.
I figured out that if all of my index shards be on the same data node, indexing time is about 1s but when some primary shards reallocate to another node. indexing time goes to 15-20s.

when I use parallel bulk in multiple data node environment, indexing time reduced to 2-4s, but I get confused why adding more data nodes to the cluster will increase indexing time?

this is a test environment and I don't have any special setting on this cluster.
my cluster version is 6.5.1 and data nodes have 64g ram, 31g heap, and 16core.

Thanks if anyone can help me in this situation


(Simon Willnauer) #2

something must be up with you setup. either some replicas are not ready or you are starting you test to early. There is for sure some overhead to having replicas but it should be not much. you can look at our nightly benchmarks to get an idea https://elasticsearch-benchmarks.elastic.co/index.html#tracks/http-logs/nightly/30d