ESRally Benchmarks - More nodes = Less throughput?

Hi everyone,

I have a class project where I'm trying to analyze ElasticSearch's scalability on Docker Swarm. To do that, I'm trying to use Rally to benchmark an ES cluster, and ideally I would like to see the throughput go up as the nodes increase. The problem is that a 1 node cluster achieves more throughput than 4 nodes.

Keep in mind that this is all running on a single machine.

My benchmark is Percolator with a small change:

{ 
      "operation": "percolator_with_content_president_bush",
      "clients": 5,
      "warmup-iterations": 100,
      "iterations": 1000,
      "target-throughput": 10000
    }

I changed the clients to 5 in this query and the target-throughput 10000 hoping it would be more than a single node could cope with.

The results for 1 node were:

   All |                 Min Throughput |    percolator_with_content_president_bush |      166.65 |  ops/s |
|   All |              Median Throughput |    percolator_with_content_president_bush |      245.29 |  ops/s |  
|   All |                 Max Throughput |    percolator_with_content_president_bush |      278.72 |  ops/s |

While 4 nodes only managed this:

  All |                 Min Throughput |    percolator_with_content_president_bush |       86.34 |  ops/s |
|   All |              Median Throughput |    percolator_with_content_president_bush |      161.64 |  ops/s |

| All | Max Throughput | percolator_with_content_president_bush | 195.55 | ops/s |

My PC has 32GB of RAM and 8 CPUs so it should be able to handle it.

Any thoughts?

If a single node can saturate the resources of the machine, then adding additional nodes will not give any additional throughput as the total amount of resources is still the same. You are probably likely to instead see a decrease as there is more overhead and communication required between the nodes.

The whole point of scaling out is to add more resources to the cluster, which you do not do.

1 Like

Thanks for the reply Christian.

So for the sake of the project, is there a way I can "fake" this by limiting the resources a single ES instance can access?

Yes, that might be an option.

Do you have any recommendation on how to do that? Can I do it on ES or should I do it on Docker? Also what usually causes a greater impact, memory or cpu?

You need to do that in Docker. Give each node a share of RAM and CPU.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.