ESRally Benchmarks - More nodes = Less throughput?

GGFPC · October 31, 2018, 12:29pm

Hi everyone,

I have a class project where I'm trying to analyze ElasticSearch's scalability on Docker Swarm. To do that, I'm trying to use Rally to benchmark an ES cluster, and ideally I would like to see the throughput go up as the nodes increase. The problem is that a 1 node cluster achieves more throughput than 4 nodes.

Keep in mind that this is all running on a single machine.

My benchmark is Percolator with a small change:

{ 
      "operation": "percolator_with_content_president_bush",
      "clients": 5,
      "warmup-iterations": 100,
      "iterations": 1000,
      "target-throughput": 10000
    }

I changed the clients to 5 in this query and the target-throughput 10000 hoping it would be more than a single node could cope with.

The results for 1 node were:

   All |                 Min Throughput |    percolator_with_content_president_bush |      166.65 |  ops/s |
|   All |              Median Throughput |    percolator_with_content_president_bush |      245.29 |  ops/s |  
|   All |                 Max Throughput |    percolator_with_content_president_bush |      278.72 |  ops/s |

While 4 nodes only managed this:

  All |                 Min Throughput |    percolator_with_content_president_bush |       86.34 |  ops/s |
|   All |              Median Throughput |    percolator_with_content_president_bush |      161.64 |  ops/s |

My PC has 32GB of RAM and 8 CPUs so it should be able to handle it.

Any thoughts?

Christian_Dahlqvist · October 31, 2018, 12:37pm

If a single node can saturate the resources of the machine, then adding additional nodes will not give any additional throughput as the total amount of resources is still the same. You are probably likely to instead see a decrease as there is more overhead and communication required between the nodes.

The whole point of scaling out is to add more resources to the cluster, which you do not do.

GGFPC · October 31, 2018, 1:59pm

Thanks for the reply Christian.

So for the sake of the project, is there a way I can "fake" this by limiting the resources a single ES instance can access?

Christian_Dahlqvist · October 31, 2018, 2:00pm

Yes, that might be an option.

GGFPC · October 31, 2018, 8:18pm

Do you have any recommendation on how to do that? Can I do it on ES or should I do it on Docker? Also what usually causes a greater impact, memory or cpu?

Christian_Dahlqvist · October 31, 2018, 8:23pm

You need to do that in Docker. Give each node a share of RAM and CPU.

system · November 28, 2018, 8:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES Benchmark using rally to stress a 2 node setup Elasticsearch rally	6	2492	November 8, 2018
ES cluster throughput drops with 6 node cluster Elasticsearch	5	496	April 16, 2020
Elasticsearch index throughtput Elasticsearch	15	1585	April 17, 2019
A question for result benchmark Elasticsearch rally	2	791	March 27, 2017
Performance difference 1 node vs 3 nodes Elasticsearch rally	5	747	March 8, 2018

ESRally Benchmarks - More nodes = Less throughput?

Related topics