Elasticsearch basic performance

elastic_paul · November 24, 2017, 4:02pm

I have 12 servers running logstash. They are sending data to 4 ES nodes. Each ES node has 24 cores, 36TB of spinning disks and 128GB RAM. The 12 logstash machines are sending round robin to the 4 ES nodes. All going into one index, with no replication and 16 shards (4 shards per ES node). Everything has been upgraded to 6.0.

Performance is poor and nothing I can do seems to change anything.
For example here is a snapshot of the thread pool:

hdp15 bulk                25 270 18912
hdp16 bulk                25 201 22420
hdp13 bulk                25 278 29157
hdp14 bulk                25 560 16598

Queue is set to 600 but there are lots of rejects so the ES nodes can not keep up.

Attached are some screenshots of an ES node. Any ideas on how I might explore increasing the performance. I only care about bulk ingesting. I don't care about search performance at this stage.

Egor1 · November 24, 2017, 5:11pm

On the last graph, it's clear that the number of index segments expolded once you started ingestion. As you on spinning media, segments merging can impact performance badly, so have a look how to tune this.
Also, given that you don't care about search during the indexing, consider changing index.refresh_interval, so you have fewer segments created in the first place, see doc.

elastic_paul · November 24, 2017, 5:58pm

Thanks for the reply.
Unfortunately that has all been removed in 6.0

Store throttling has been removed. As a consequence, the indices.store.throttle.type and indices.store.throttle.max_bytes_per_sec cluster settings and the index.store.throttle.type and index.store.throttle.max_bytes_per_sec index settings are not recognized anymore.

I already have refresh to -1 and replication to 0

Egor1 · November 24, 2017, 6:30pm

How about index.merge.scheduler.max_thread_count. Also play around with batch size, although, you probably have tried this already.

Christian_Dahlqvist · November 25, 2017, 11:01am

How many disks do you have per node? Are these set up as multiple data paths or using some RAID configuration? With only 4 active shards per node, are you utilizing all disks when indexing?

What does disk I/O and iowait look like?

What is the average size of your documents? Are you using the default Logstash settings around batch and bulk size?

elastic_paul · November 27, 2017, 10:28am

12 spinning disks per node. Multiple data paths, no RAID

With 4 shards per node I won't be accessing all disks. Should I make it 12 shards per node? That would be 48 shards per index?

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     9.61    0.02    0.50     2.20    80.80   158.17     0.08  161.75    3.06  168.42   3.34   0.18
sdc               0.00     7.09    0.03    0.46     3.04    60.27   129.61     0.06  112.80    3.49  119.45   3.68   0.18
sdd               0.00     4.89    0.02    0.23     3.39    40.86   177.80     0.04  179.64    2.97  197.80   2.99   0.07
sdf               0.00     9.17    0.01    0.62     0.79    78.12   126.55     0.09  147.24    7.15  148.98   3.21   0.20
sde               0.00     8.52    0.03    0.58     5.15    72.58   127.08     0.07  111.04    3.40  116.22   3.55   0.22
sdg               0.00    14.40    0.01    1.45     0.89   126.19    87.45     0.15  103.39    7.33  103.92   3.54   0.51
sdb               0.00    13.44    0.05    1.70     4.28   120.60    71.33     0.09   53.94    4.67   55.34   3.15   0.55
sdh               0.00    18.96    0.07    2.39     9.04   170.26    73.01     0.15   62.62    3.13   64.32   2.64   0.65
sdi               0.00     8.01    0.01    0.33     1.59    66.60   200.46     0.08  221.19    5.04  227.82   3.82   0.13
sdk               0.00    19.82    0.05    1.90     4.89   173.13    91.48     0.15   74.63    4.88   76.46   3.90   0.76
sdl               0.00     1.61    0.01    0.11     0.98    13.72   128.81     0.02  149.83   10.26  160.44   4.28   0.05
sdj               0.00    17.04    0.03    1.61     3.02   148.69    92.33     0.13   76.20    4.20   77.74   3.60   0.59

Christian_Dahlqvist · November 27, 2017, 10:37am

You seem to have a fair bit of await across quite a few of the disks. Did you change the index.merge.scheduler.max_thread_count parameter to 1? Did it make any difference?

How many documents are you sending per bulk request? It might be worth to try increasing the pipeline.batch.size to e.g. 1000, in order to get larger bulk requests sent to Elasticsearch.

elastic_paul · November 27, 2017, 11:00am

Yes, I have now made that change, and I have increased shards to 48, so 12 per node, so 1 per disk.

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   277.40    0.00  169.00     0.00  3373.60    19.96     1.16    6.88    0.00    6.88   5.58  94.29
sdc               0.00   441.70    0.00  129.20     0.00  4452.00    34.46     1.33   10.26    0.00   10.26   7.61  98.38
sdd               0.00   429.20    0.00  152.50     0.00  4505.60    29.54     1.35    8.88    0.00    8.88   6.42  97.96
sdf               0.00   258.20    0.00  150.90     0.00  3099.20    20.54     1.19    7.90    0.00    7.90   6.30  95.06
sde               0.00   327.50    0.00  145.00     0.00  3625.60    25.00     1.27    8.77    0.00    8.77   6.76  98.01
sdg               0.00   310.70    0.00  147.30     0.00  3499.20    23.76     1.19    8.05    0.00    8.05   6.57  96.83
sdb               0.00   268.00    0.00  148.10     0.00  3150.40    21.27     1.11    7.49    0.00    7.49   6.38  94.43
sdh               0.00   656.30    0.00  146.50     0.00  6290.40    42.94     1.54   10.51    0.00   10.51   6.70  98.21
sdi               0.00   283.10    0.00  127.30     0.00  3141.60    24.68     1.30   10.17    0.00   10.17   7.72  98.28
sdk               0.00   288.50    0.00  125.30     0.00  3171.20    25.31     1.22    9.68    0.00    9.68   7.84  98.29
sdl               0.00   254.50    0.00  125.50     0.00  2891.20    23.04     1.18    9.39    0.00    9.39   7.81  98.03
sdj               0.00   377.80    0.00  145.00     0.00  4037.60    27.85     1.28    8.80    0.00    8.80   6.74  97.79

Haven't tried increasing pool but get lots of these warnings, with pool size of 600.

[2017-11-27T10:56:28,923][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@4e97be78 on EsThreadPoolExecutor[bulk, queue capacity = 600, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@52739ce6[Running, pool size = 25, active threads = 25, queued tasks = 600, completed tasks = 38363803]]"})

Christian_Dahlqvist · November 27, 2017, 11:05am

Increasing the shard count might actually make your bulk rejection issues worse, as described in this blog post. You have a lot of Logstash instances sending data to the cluster in parallel, so it may make a lot of sense to increase the pipeline batch size to get fewer, but larger, bulk requests.

How come you have so many Logstash instances for so few Elasticsearch nodes? Generally just a few Logstash instances can saturate a cluster that size unless they are doing very heavy or inefficient processing of events.

elastic_paul · November 27, 2017, 11:29am

I am sending the data from a hadoop cluster which has 12 nodes. I use the reduce part of a MR job to send the data to the 4 ES nodes. If I set the number of reducers down to 4 then it would only ever have 4 nodes sending to the 4 ES nodes. I will experiment with this, starting with reducers=1!

Abhilash_Bolla · November 28, 2017, 10:27am

Hey Paul, May I know which tool you use for monitoring performance parameters of a node?

elastic_paul · November 28, 2017, 1:04pm

I am using (or trying to use) the X-Pack plugin and on the actual nodes is use iostat

system · December 26, 2017, 1:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does indexing performance vary over increase in number of nodes? Elasticsearch	10	2013	July 5, 2017
Different number of nodes/replicas/shards doesnt change performance Elasticsearch	10	727	July 5, 2017
Bulk indexing via ES HTTP Elasticsearch	3	345	July 6, 2017
HELP... How to Increase bulk Index rate performance - 15 Node Elasticsearch Cluster (error 429) Elasticsearch	4	3164	January 4, 2017
Cluster from virtual machines Elasticsearch	5	770	July 5, 2017

Elasticsearch basic performance

Related topics