Indexing rate performance in cluster

Chang_Oh_Heo · September 12, 2016, 4:10pm

Hi, I am load testing currently.

I got 10,000 documents/sec from one shard in single node.
I tried in cluster (1 master node, 5 data node)
But indexing rate is decrease around 8,000 doc/sec.

I don`t understand this situation. Is it normal operation?

I am using spin disk (2T * 4 Disk per each node). Does it effect to cluster performance?

And my node use only 10 or 20% cpu & ram but disk utilization is around 95%.
is it normal in spin disk?

I am testing ELK staks.

Christian_Dahlqvist · September 12, 2016, 7:53pm

When you index against the 6 node cluster, how many indices and shards are you indexing into? Are you sending indexing requests to all data nodes in the cluster? Which version of Elasticsearch are you using? What bulk size are you using? How large/complex are your documents?

Chang_Oh_Heo · September 13, 2016, 12:26pm

There is only 2 shard per node for mavel & kibana.
I am testing 2.3.5 & 2.4.0 but result is same.
Bulk size is about 4~5m. doc is not complex. that is just nginx access log. I am sending request to all data nodes.

Christian_Dahlqvist · September 13, 2016, 3:43pm

What is the name if the index/indices you are indexing your nginx data into? How many shards/replicas does this have?

Am I reading it correctly that your bulk size is 4-5 million documents? If so, that seems way too high. A few thousand documents per bulk is quite common, as the performance benefits tail off after a certain size and instead start causing problems.

Chang_Oh_Heo · September 16, 2016, 5:53am

index namsme is logstash_access_YYYY_mm_dd.
5 shards, 0 replicas.
I will try 1~2 thounsand documents. but i think that isn't root cause. Young gc is happend 8 times per a minute. old heap is about 14g. young heap is 1.4g. if i increase young heap, is it helpful ?

Chang_Oh_Heo · September 19, 2016, 8:14am

I found a root cause.
It was happend by setting of pipeline-batch-size.
I only modified flush_size of elasticsearch output plugin.
Then It doesn`t effect real batch size value.
Es use cpu & heap when i run with --pipeline-batch-size 3500.
Thanks for your support.

Topic		Replies	Views
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	376	March 24, 2021
Elasticsearch Performance Problem Elasticsearch	5	784	June 1, 2017
Investigate high GC time when indexing Elasticsearch	18	922	September 25, 2023
Index throughput issues - tried all tuning suggestions posted Elasticsearch	1	381	July 6, 2017
Debugging extremely slow indexing Elasticsearch	39	6561	February 16, 2021

Indexing rate performance in cluster

Related topics