Index performance does not increase linearly

(Yuxuan Zong) #1

Hi, all,
I want to know if the speed of index docs will increase linearly with the number of nodes increases. Such as inputing 100 docs/s with only one node in the cluster, and 400 docs/s when there are 4 nodes in the cluster. (Only one primary shard per node without replication, and the configuration of each nodes are almost same.)
I use JMeter to test this situation.
First, 1master node, 1 coordinating node and 1 data node in the cluster, one index with one primary shard without replication. JMeter sends bulk request to coordinating node, and record the input speed when data node active bulk thread maximum and queue is not empty. Such as 200 docs/s.
Second, 1master node, 1coordinating node and 2 data nodes in the cluster, and record the input speed when active bulk thread maximum and queue is not empty. Such as 350 docs/s.
Third, 1master node, 1coordinating node and 4 data nodes in the cluster, and record the input speed when active bulk thread maximum and queue is not empty. Such as 600 docs/s.
It seems that index performance does not increase linearly with nodes increases.
I want to know why? And is there a way to solve this problem?
Thank you.

(Christian Dahlqvist) #2

Those numbers sound very low. What is the specification of the hardware the cluster is running on? How are the nodes configured? What do you mean by the statement that the nodes are configured almost the same? What type of data are you indexing? How many concurrent indexing threads/processes are you using? Does this vary across the tests? What is the bulk size you are using?

(Yuxuan Zong) #3

Thank you for your reply.
Cluster nodes have same CPU type and cores(32 cores), same disk(Raid 0), Memory is not same, but all greater than 64G(hardware, not es process).
Every nodes has only 1 primary shard without replication, and other configurations are default.

Bulk request like this:
{ "index" : { "_index" : "website", "_type" : "blog"}
{ "f1" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f2" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f3" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f4" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f5" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f6" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f7" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f8" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f9" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}", "f10" : "${__RandomString(100,abcdefghijklmnopqrstuvwxyz,)}"}

Of course, I have changed the field number(1, 10 or 100), value size(100, 1000 or 10000) and index number(1, 500 or 1000) of one bulk request during testing.

Index speed is very different when index different documents, it depend on document size, field number, jmeter request thread, jvm heapsize and so on. 200 docs/s is an extreme situation, just 4G memory for es process, and heavy request for more than 5 hours. In fact, it also can be 150,000 docs/s/node in other situation.

I do the test many times with different parameters above, and when data node active bulk thread maximum for a while(I think this means the highest index speed of this node), such as 30 minutes, and and record the input speed. And the speed increase is not linear with increase of nodes number.

(Christian Dahlqvist) #4

This is expected as the amount of work that need to be performed per document varies a lot. This is the reason documents/second is generally not a very useful metric when comparing performance across use cases.

I also have a few further questions:

  • Are you letting Elasticsearch assign the document id automatically?
  • How many concurrent indexing threads are you using? Do you change this as the number of data nodes increases? What is the bulk size you are using?
  • 4GB heap sounds low for this type of indexing workload. Are you monitoring GC to verify that the nodes are not suffering from frequent or slow GC due to this?
  • I assume all your traffic is going through the one dedicated coordinating-only node. Is this the same specification as the other nodes? Are you monitoring GC on this node to verify that it is not suffering from frequent or slow GC?
  • Indexing random data like this is not very realistic, and will not provide realistic results. Why are you not using more realistic data?
  • What type of network do you have in place between the nodes? How have you verified that this is not a limiting factor, e.g. on the coordinating-only node?

(Yuxuan Zong) #5
  1. Document id is automatic:
    { "index" : { "_index" : "website", "_type" : "blog"}
  2. Jmeter Bulk request thread is increasing with data nodes, 40 threads per node. Bulk size is 10M.
  3. 4GB is just a test situation, I also set to 31GB other test. Of course, data nodes will GC frequently when 4GB.
  4. Coordinating node has same configuration, and GC is very slow compared to data nodes.
  5. The Internal network is working good without limit.
  6. Indexing random data just for convenience. If documents/second is not a very useful, so which metric I can use to test performance increase linearly?

In fact, I just want to test multi-node performance linear growth instead of single node highest performance.
Thank you.

(Christian Dahlqvist) #6

That is a larger bulk size than usually recommended. Increasing this will naturally also increase load on the coordinating node, which could become a problem.

Please provide data and setting for a set of coordinated runs. Mixing data and numbers from different runs like this just wastes time.

Then that seems to be a problem. Maybe you need to add another coordinating-only node to ensure this is not the bottleneck?

Every network has a limit. Are you using 10G network? Gigabit network?

You may also find this Elastic{ON} talk useful.

(Yuxuan Zong) #7

OK, thank you for your advice, I will try again later.

(Christian Dahlqvist) #8

For our nightly benchmarks we use. cluster with 10G network as gigabit network can often be the limiting factor for some of these heavy indexing loads. As you are increasing the number of indexing threads/processes, you should also look out for the load test driver becoming a bottleneck.

(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.