Adding more data nodes decreased primary indexing rate

ELK_GB · July 26, 2017, 7:11pm

Hi There,

I have been running different capacity and performance test scenarios on our ELK cluster and today I have noticed something that didn't make sense to me.

I had 4 data nodes and 2 coordinating nodes that I send my LS output to. Each index has 4 primary shards and 1 replica.
I was able to see a constant primary indexing rate of 3k /s coming from 3 different LS nodes.
The data is metricbeat data nothing beyond that.

I wanted to be able to increase my indexing rate so I have decided to add two more data nodes so now I have 6 and increased number of primary shards to 6.

Any idea why once the 2 extra data nodes were added, with the same exact settings as the other 4, my indexing rate has dropped from 3k to around 800 /s ?

warkolm · July 26, 2017, 8:06pm

What version?
What hardware?
What JVM?
What settings?

ELK_GB · July 27, 2017, 3:08am

What version?

5.4

What hardware?

We are running the cluster on VMs, each has its own.
6 cpu for each data node, 24GB RAM and 12GB for heap
data stored locally.

What JVM?

java version 1.8.0_111

What settings?

Nothing special mostly defaults on ES and recently have increased the thread_pool bulk queue size, with 4 nodes and 6 nodes.
All our LS->ES outputs are directed to the ES cluster through the coordinating nodes.

Unfortunately with 6 nodes we see data barely streaming across, with very low utilization of resources.

warkolm · July 27, 2017, 8:10am

How are you measuring this?

What's sending data to ES?

ELK_GB · July 27, 2017, 1:15pm

What's sending data to ES?

As I mentioned above we have only metricbeat data at this point, sending metrics from so many different servers to LS (for future purposes) no filtering is done for now. Then all of the LS nodes send to the ES cluster.

We have tested with two scenarios when we have Kafka between source and LS and with no Kafka, we got almost the same results.

How are you measuring this?

we use the monitoring feature in Kibana.

ELK_GB · July 28, 2017, 7:52pm

can someone help me understand why when I run the the thread_pool cat API, for bulk, I see almost all my data nodes queue empty ? is this expected behavior ?

I increased the queue_size hoping to get ore data but after running the API I realized the queue is not even reaching the 200 default size.

> node_name  name active rejected completed queue_size queue max min type
> data-1   bulk      1        0   1378783       1000     0   7   7 fixed
> data-3   bulk      0        0   1278412       1000     0   7   7 fixed
> data-2   bulk      2        0   1428337       1000     0   7   7 fixed
> data-4   bulk      7        0   1403869       1000   187   7   7 fixed

warkolm · July 29, 2017, 4:58am

It's good, it means Elasticsearch is coping with the traffic.
Try increasing the size of the bulk request you are sending.

ELK_GB · July 29, 2017, 6:15am

Thanks!

So in my case since my data is coming through logstash would I be able to increase the bulk size request through the batch size and workers or would it be something else ?

As I have looked previously at the logstash elasticsearch output plugin for such setting, to try and increase my bulk size request but didn't see any options for that.

system · August 26, 2017, 6:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data hours behind in Kibana Kibana	2	601	August 7, 2017
Elasticsearch performance very slow! Elasticsearch	13	13530	August 15, 2017
Improve Elastic Cluster Performance/Indexing Rate Elasticsearch	2	433	February 9, 2018
Spikes in Indexing Rate - Methods to Increase Indexing Rate? Elasticsearch	7	2111	January 12, 2017
ES cluster throughput drops with 6 node cluster Elasticsearch	5	496	April 16, 2020

Adding more data nodes decreased primary indexing rate

Related topics