Is the input speed of the elasticsearch cluster linear?

nathaniel · June 26, 2015, 9:33pm

Hello guys, I am a newbie in elasticsearch. Recently I set up a elasticsearch cluster to save my logs.The cluster has 3 nodes, all of them serves as both master node and data node. I used logstash to foward data to the AWS load balancer, which will then seperate load to the 3 nodes.

The problem is, the throughput is really low, besides, I realized that it is not linear. With one node in the cluster, the throughput is 3666 events /sec when inputing. With two nodes, the throughput is 2750 per node per sec. When I am using all three nodes, the throughput is only 1666 per node per second.

Is the performance supposed to be like this? Or should the throughput linearly increase when adding nodes into the cluster? Hope you guys can give me some help. Thanks in advance.

warkolm · June 26, 2015, 10:52pm

It should increase by adding more nodes.

What does your LS output config look like? What instance size are you using?

nathaniel · June 29, 2015, 5:00pm

Thanks for answering workolm.
My current situation is that I need to process at lease 34000 events per second, and I am trying to estimate how many notes do I need to process in this speed. I noticed that the processing speed is not linear as the number of node increases, so I wonder how to measure the exact node number.

Here is my LS ouput:

output {
elasticsearch {
bind_host => "localhost"
cluster => "elasticsearch-prodII"
workers => 8
protocol => "http"
flush_size => 200
}

Christian_Dahlqvist · June 29, 2015, 5:20pm

If you have the standard 1 replica configured, I would expect 2 nodes to perform about the same as a single node as both nodes would hold and index all data. Once you add nodes from there on, you should see improved throughput assuming you do not change the number of replicas.

This also assumes that you are sending traffic to all nodes and that the nodes are not deployed on a single node where they fight for the same resources.

Your output configuration seems to suggest that you are sending all load to a single node. Is that the case? What type of hardware configuration are you using?

nathaniel · June 29, 2015, 5:39pm

Hello Christian, thanks so much for answering.

My cluster structure is that I have three nodes, each running a redis server, a logstash and an elasticsearch, and the three nodes are in the same elasticsearch cluster. These three nodes all act as both data node and master node. All the data input comes from an AWS load balancer, which seperates the load equally to the three nodes' redis server. The logstash reads the data from the redis server, and then send to its local elasticsearch. This is why my LS output is sending to localhost.

To wrap it up, I'm using 3 nodes as a cluster, each running logstasn and elasticsearch. The logstash send the data to the local elasticsearch.

I am trully a newbie, I should learn more about this. Do you think this structure works?

As for the hardware configuration, I am using AWS m4.xlarge node. Which has 4 core, 16G RAM. They claim the m4 nodes are using intel Xenon processor, but after all, they are VMs, so don't raise your expectaton

Thanks for answering !

Christian_Dahlqvist · June 29, 2015, 6:07pm

I would recommend monitoring the servers during indexing and trying to identify what is the bottleneck and limiting performance, e.g. CPU, disk I/O, memory pressure. As both Logstash and indexing in Elasticsearch can be CPU intensive, that would be what I check first.

nathaniel · June 29, 2015, 6:09pm

Great advice. Thanks so much Christian.
I noticed that the cpu is the bound. It is nearly fully occupied the whole time.

nik9000 · June 29, 2015, 6:24pm

The hot_threads will be useful there, I think.

Usually the right way to get optimal performance is to have one copy of the shard per server - so if you have 6 servers you want your index in 3 shards with 1 replica for 6 ( (1+1)*3 ) total indexes. There are lots of parameters you can tweak to sacrifice query performance for indexing performance. Have a look at "update index settings". hot_threads will probably lead you somewhere though.

nathaniel · June 29, 2015, 6:28pm

Thanks so much Nik! You pointed out a new direction for me. I'll go check it out.

Topic		Replies	Views
ES cluster throughput drops with 6 node cluster Elasticsearch	5	496	April 16, 2020
Index performance does not increase linearly Elasticsearch	8	844	October 27, 2018
Scaling Writes/Indexing Elasticsearch	9	1638	July 6, 2017
ES output plugin is slow in indexing even with 2 data nodes and 1 master node Logstash	22	1072	July 2, 2019
Benchmarks (again) Elasticsearch	11	393	July 6, 2017

Is the input speed of the elasticsearch cluster linear?

Related topics