Indexing rate performance in cluster


(Chang Oh Heo) #1

Hi, I am load testing currently.

I got 10,000 documents/sec from one shard in single node.
I tried in cluster (1 master node, 5 data node)
But indexing rate is decrease around 8,000 doc/sec.

I don`t understand this situation. Is it normal operation?

I am using spin disk (2T * 4 Disk per each node). Does it effect to cluster performance?

And my node use only 10 or 20% cpu & ram but disk utilization is around 95%.
is it normal in spin disk?

I am testing ELK staks.


(Christian Dahlqvist) #2

When you index against the 6 node cluster, how many indices and shards are you indexing into? Are you sending indexing requests to all data nodes in the cluster? Which version of Elasticsearch are you using? What bulk size are you using? How large/complex are your documents?


(Chang Oh Heo) #3

There is only 2 shard per node for mavel & kibana.
I am testing 2.3.5 & 2.4.0 but result is same.
Bulk size is about 4~5m. doc is not complex. that is just nginx access log. I am sending request to all data nodes.


(Christian Dahlqvist) #4

What is the name if the index/indices you are indexing your nginx data into? How many shards/replicas does this have?

Am I reading it correctly that your bulk size is 4-5 million documents? If so, that seems way too high. A few thousand documents per bulk is quite common, as the performance benefits tail off after a certain size and instead start causing problems.


(Chang Oh Heo) #5

index namsme is logstash_access_YYYY_mm_dd.
5 shards, 0 replicas.
I will try 1~2 thounsand documents. but i think that isn't root cause. Young gc is happend 8 times per a minute. old heap is about 14g. young heap is 1.4g. if i increase young heap, is it helpful ?


(Chang Oh Heo) #6

I found a root cause.
It was happend by setting of pipeline-batch-size.
I only modified flush_size of elasticsearch output plugin.
Then It doesn`t effect real batch size value.
Es use cpu & heap when i run with --pipeline-batch-size 3500.
Thanks for your support.


(system) #7