Why is Elastic's performance better with 1 data node than 2?

Greetings to all this community of Elastic, I have a great uncertainty which I hope to be able to solve with the help of your knowledge, and which I will explain later. Keep in mind that I'm doing this query because I've just started with Elastic.

Well, I have an Elastic cluster with 3 nodes, all with different characteristics:

Node1 (master only):

  • Intel® Core ™ 2 Duo Processor CPU E7500 @ 2.93GHz × 2
  • 4GB RAM
  • 400GB Disc
  • Ubuntu 16.04 64-bit LTS

Node2 (data only):

  • Intel® Core ™ 2 Duo Processor CPU E7500 @ 2.93GHz × 2
  • 4GB RAM
  • 400GB Disc
  • Ubuntu 12.04 32-bit LTS

Node3 (data only):

  • Intel® Core ™ i5 CPU M 430 processor @ 2.27GHz × 4
  • 4GB RAM
  • 50GB Disk (because it is partitioned)
  • Ubuntu 16.04 64-bit LTS

It should be noted that the project I am doing is from the university but I do not know how to explain what I will detail next. The cluster works correctly both in the indexing and the queries I make. I have stored about 141 million documents (well balanced in shards for both data nodes, with 5 primary shards and 1 shard of replica to guarantee availability), for the manipulation and loading of data I am using an application made in Nodejs, until there everything excellent. My teacher has asked me to perform cluster performance tests, in order to verify that the Elastic cluster performs better under the premise of "with more nodes, better response times".
When doing the respective tests (towards the same URL of the app), I have 3 cases.

  1. With all active nodes (Node1, Node2, Node3).
  2. Only with Node1 and Node2 active.
  3. Only with Node1 and Node3 active.

Once the tests are performed using JMeter with a single thread of execution, I have the following conclusions:

  • With all the nodes running, I get response times (19892 ms) somewhat high (due to the high amount of data, the difficulty of the query and the limited resources I have, that's no problem) but the consultations work correctly .

  • Only with Node1 and Node2, the average response times (55471 ms) are much higher, thus verifying the premise mentioned by my teacher.

  • BUT IT IS AT THIS POINT WHERE MY LOGIC DOES NOT FIT. When doing the tests only with Node1 and Node3, the average response times (14377 ms) are LESS than those of the tests of the first two cases.

HELP: I need to know why this happens?

It is assumed that the times should be similar as in the second case, and greater than those in the first case, since as is known, just like the second case, only ONE data node is working.

I handle a hypothesis but I do not think it is completely true, and that is, this "phenomenon" could occur due to the resources of Node3, as far as processor is concerned it is "somewhat better", since it is a 4-core i5 , but I'm not sure I can explain.

First of all, Thanks.

Are you testing with 1 instance of JMeter with 1 thread or multiple JMeters with 1 thread?

1 JMeter with 1 thread, you are testing the Machine not the cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.