ES Write timeout

i have 3 data node cluster. each has 15gb memory for ES and 15gb for lucen
have each record of sixe 5kb around around 100 fields
pumping documents at 5000 per seconds
after some time i am getting write timeout
is it too much to expect from this configuration or should i start looking into some parameters
I see mem and cpu usage is normal not too high
Io wait also seems to be okay... i dont see huge wait
any suggestions will be very helpfull

I have 2HDD, initially i had 48 threads writing to ES but reduced to 32 thinking this could be bottleneck... I had set batch size to 5000... initially for some hours will be able to write with in milliseconds and it gradually increases to seconds and then till 30sec and timeout happens

I don't have ant text search index enabled."index":"not_analyzed" for string fields
"doc_values": true,
"norms": { "enabled": false }

Have 2 indices.. pumping each at the rate 2500 msgs per second

Have set write timeout to 30s

ESMetricStore - onFailure : Error in bulk indexing org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task. at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:70) ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] at net.appcito.elasticsearch.util.ESMetricStore.executeBatch(ESMetricStore.java:375) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] at net.appcito.elasticsearch.util.ESMetricStore.execute(ESMetricStore.java:290) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] at net.appcito.elasticsearch.util.ESMetricStore$MetricStoreTask$ExecuteMetricStoreTask.run(ESMetricStore.java:736) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_111] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_111] at

Which version of Elasticsearch are you using? What does CPU usage, disk I/O and iowait look like when it slows down? How many shards are you indexing into? Are you writing new data or updating existing documents? Are you assigning your own document ids or letting Elasticsearch set them?

I am using 2.4 version of ES. There are 3 shards, replication factor is one and refresh interval is 10s. There is no update only write operation. Data doesnt get updated at any time only read and writing new data will happen. ES assigns the Ids.

Linux 3.10.0-514.26.2.el7.x86_64 (localhost.localdomain) 	Friday 29 September 2017 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      56.22    0.13    5.82    5.98    0.00   31.85

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              35.23    39.98   85.61  117.33  2751.92 10946.02   135.00    26.26  129.40   75.73  168.56   3.85  78.07
dm-0              0.00     0.00    0.34   19.12     7.07    87.84     9.75     0.85   43.77   26.80   44.07  19.52  38.00
dm-1              0.00     0.00   39.39   38.89   157.57   155.56     8.00    14.70  187.79   61.07  316.16   3.44  26.91
dm-2              0.00     0.00   81.42   86.71  2587.00 10702.60   158.08    26.97  160.37   79.44  236.36   4.51  75.83

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      87.26    0.06    6.47    0.75    0.00    5.46

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              16.00     2.00   12.00  185.00  1848.00  4292.00    62.34     3.58   18.36   25.42   17.90   5.07  99.80
dm-0              0.00     0.00    0.00   32.00     0.00   116.00     7.25     0.66   20.81    0.00   20.81  15.25  48.80
dm-1              0.00     0.00   14.00    0.00    56.00     0.00     8.00     0.29   20.71   20.71    0.00   4.00   5.60
dm-2              0.00     0.00   15.00  126.00  1920.00  4176.00    86.47     3.05   21.82   25.80   21.34   7.07  99.70

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      91.22    0.13    7.84    0.06    0.00    0.75

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               5.00     0.00   60.00  120.00  5476.00  2756.00    91.47     1.90   10.48    1.53   14.95   4.60  82.80
dm-0              0.00     0.00    0.00   14.00     0.00    84.00    12.00     0.26   16.93    indent preformatted text by 4 spaces00     0.26   16.93

And also i see there is a huge lag in the number of counts here is the example of counts captured every 100sec

{
"TopIndicator1": {
"applicationId_count": {
"1506793300000": 310038,
"1506793400000": 499966,
"1506793500000": 499979,
"1506793600000": 500005,
"1506793700000": 499995,
"1506793800000": 499994,
"1506793900000": 499987,
"1506794000000": 499970,
"1506794100000": 500017,
"1506794200000": 499967,
"1506794300000": 499992,
"1506794400000": 500009,
"1506794500000": 499967,
"1506794600000": 499994,
"1506794700000": 499984,
"1506794800000": 499992,
"1506794900000": 500006,
"1506795000000": 499970,
"1506795100000": 478786,
"1506795200000": 499988,
"1506795300000": 499989,
"1506795400000": 499986,
"1506795500000": 499962,
"1506795600000": 499987,
"1506795700000": 499988,
"1506795800000": 500007,
"1506795900000": 499971,
"1506796000000": 499987,
"1506796100000": 499992,
"1506796200000": 500005,
"1506796300000": 499990,
"1506796400000": 499971,
"1506796500000": 500006,
"1506796600000": 499987,
"1506796700000": 499990,
"1506796800000": 499987,
"1506796900000": 478099,
"1506797000000": 500002,
"1506797100000": 499993,
"1506797200000": 499965,
"1506797300000": 500012,
"1506797400000": 499967,
"1506797500000": 499992,
"1506797600000": 500009,
"1506797700000": 499991,
"1506797800000": 499987,
"1506797900000": 499989,
"1506798000000": 499996,
"1506798100000": 499986,
"1506798200000": 499994,
"1506798300000": 499985,
"1506798400000": 499966,
"1506798500000": 500009,
"1506798600000": 499962,
"1506798700000": 478794,
"1506798800000": 500006,
"1506798900000": 499962,
"1506799000000": 500013,
"1506799100000": 496814,
"1506799200000": 474680,
"1506799300000": 467449,
"1506799400000": 407038,
"1506799500000": 265276,
"1506799600000": 249959,
"1506799700000": 249607,
"1506799800000": 249918,
"1506799900000": 250009,
"1506800000000": 250321,
"1506800100000": 250068,
"1506800200000": 250205,
"1506800300000": 250111,
"1506800400000": 250388,
"1506800500000": 240035,
"1506800600000": 250100,
"1506800700000": 249970,
"1506800800000": 249779,
"1506800900000": 250320,
"1506801000000": 249591,
"1506801100000": 249841,
"1506801200000": 250021,
"1506801300000": 249759,
"1506801400000": 249718,
"1506801500000": 250330,
"1506801600000": 249687,
"1506801700000": 249897,
"1506801800000": 249783,
"1506801900000": 237220,
"1506802000000": 140694,
"1506802100000": 22457
}
}
}

Do these 3 nodes have 2 directly attached HDD each or are they sharing the same disks? It looks like the disks are quite busy with a fair amount of await, so it could be that your storage is limiting performance. Do you see anything in the logs about merging? Do you have any non-default configurations for these nodes?

in this setup they share the HDD. But we have tested with 3 separate SDD's there also we face the same issue... Writing to ES decreases exponentially. Why its not linear... It gives consistent write number for some hours and then it gradually decreases...

You say that it is initially fast - is that following a restart, after starting writing to a new index or something else? What is the specification of the Elasticsearch nodes? Do you have any non-standard configuration?

Yes you are correct initially its fast... There is no restart happens...

node.max_local_storage_nodes: 1
index.mapper.dynamic: true
action.auto_create_index: true
action.disable_delete_all_indices: true
gateway.expected_nodes: 1
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: metrics-master

discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_retries: 5
discovery.zen.fd.ping_timeout: 120s
gateway.expected_data_nodes: 1
gateway.expected_master_nodes: 1
gateway.recover_after_data_nodes: 1
gateway.recover_after_master_nodes: 1
gateway.recover_after_time: 5m
http.cors.enabled: false
index.refresh_interval: 20s
index.translog.interval: 20s
index.warmer.enabled: false
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 30%
indices.store.throttle.type: none
script.engine.groovy.inline.aggs: on
script.engine.groovy.inline.update: on
script.indexed: on
script.inline: on
threadpool.bulk.queue_size: 2000
bootstrap.memory_lock: true
indices.store.throttle.max_bytes_per_sec: 100mb

I would recommend installing Marvel so that we can see how the cluster behaves over time and potentially spot patterns. I still do not understand what the conditions are when it is initially fast (restart, new index being written to etc) compared to when it slows down.

Ok I am going to install marvel and check . Meanwhile can you look into the configurations which i given above and let me know your feedback...

Hi here i attached the marvel results from one node,,, its pretty simillar for all 3 nodes

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     2.00    0.00   86.00     0.00 15456.00   179.72     8.76  101.74    0.00  101.74  11.62  99.95
dm-0              0.00     0.00    0.00   12.00     0.00    76.50     6.38     0.93   77.25    0.00   77.25  41.33  49.60
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   68.50     0.00 15379.50   224.52     8.24  120.17    0.00  120.17  14.58  99.90

10/03/2017 01:36:46 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       3.54    0.00    2.23    9.30    0.00   84.93

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00   47.50     0.00  1870.00    39.37     2.67   55.76    0.00   55.76  21.05 100.00
dm-0              0.00     0.00    0.00   14.50     0.00   132.00     9.10     0.63   43.76    0.00   43.76  29.31  42.50
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   26.50     0.00  1744.00    65.81     2.03   76.04    0.00   76.04  37.74 100.00

10/03/2017 01:36:48 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       6.31    0.00    3.49   10.65    0.04   79.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     1.50    0.00   76.00     0.00 11338.00   149.18     4.19   55.47    0.00   55.47  13.14  99.90
dm-0              0.00     0.00    0.00   15.50     0.00   100.50     6.48     0.93   59.74    0.00   59.74  41.10  63.70
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   55.00     0.00 11231.50   204.21     3.33   60.86    0.00   60.86  18.12  99.65

10/03/2017 01:36:50 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       4.87    0.04    2.48    5.84    0.00   86.76

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00   42.50     0.00  2629.00    61.86     1.71   40.16    0.00   40.16  22.76  96.75
dm-0              0.00     0.00    0.00   11.50     0.00   120.00    10.43     0.38   33.00    0.00   33.00  29.65  34.10
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   24.50     0.00  2509.00   102.41     1.33   54.16    0.00   54.16  39.08  95.75

Can you guide me if anything is wrong with this....

"1507036380000": 38232,
"1507036390000": 55522,
"1507036400000": 50005,
"1507036410000": 50002,
"1507036420000": 50001,
"1507036430000": 49998,
"1507036440000": 49987,
"1507036450000": 50014,
"1507036460000": 49995,
"1507036470000": 50001,
"1507036480000": 50000,
"1507036490000": 49999,
"1507036500000": 49997,
"1507036510000": 50000,
"1507036520000": 50000,
"1507036530000": 50000,
"1507036540000": 49997,
"1507036550000": 49997,
"1507036560000": 50002,
"1507036570000": 49998,
"1507036580000": 50002,
"1507036590000": 49998,
"1507036600000": 49999,
"1507036610000": 49999,
"1507036620000": 49982,
"1507036630000": 49997,
"1507036640000": 49999,
"1507036650000": 50014,
"1507036660000": 50000,
"1507036670000": 50000,
"1507036680000": 49998,
"1507036690000": 50000,
"1507036700000": 48415,
"1507036710000": 40669,
"1507036720000": 36761,
"1507036730000": 29069,
"1507036740000": 14685,
"1507036750000": 8353,
"1507036760000": 6000,
"1507036770000": 2933,
"1507036780000": 1548,
"1507036790000": 1344,
"1507036800000": 740,
"1507036810000": 108

If i see here there is a lag for around 25min,,,

"index": {
"creation_date": "1507033167373",
"uuid": "QBbHwOa-Rz2fW2Tm39V0zw",
"replication": "async",
"number_of_replicas": "1",
"number_of_shards": "3",
"refresh_interval": "1s",
"version": {
"created": "2040199"
}

@Christian_Dahlqvist Hi Can you look into this and suggest any improvement we can do?

Can you show the monitoring graphs for indexing rate as well?

Its for 12hours

Here is the iostat for one of the nodes

10/04/2017 04:56:59 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      16.67    0.13    3.37    5.56    0.00   74.27

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00   60.50     0.00  1602.50    26.49     1.68   25.50    0.00   25.50  14.88  90.05
dm-0              0.00     0.00    0.00   15.50     0.00    96.00     6.19     0.32   20.39    0.00   20.39  17.84  27.65
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   37.00     0.00  1513.00    40.89     1.36   33.16    0.00   33.16  24.30  89.90

10/04/2017 04:57:01 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      17.97    0.00    2.40   11.15    0.00   68.48

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     2.50    0.00   92.50     0.00  9335.50   100.92     3.46   38.85    0.00   38.85  10.51  97.20
dm-0              0.00     0.00    0.00   29.00     0.00   207.50     7.16     1.63   56.14    0.00   56.14  19.19  55.65
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   58.00     0.00  9124.00   157.31     1.88   34.59    0.00   34.59  16.54  95.95

10/04/2017 04:57:03 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      16.86    0.00    3.00    6.25    0.04   73.85

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.50    0.00   68.00     0.00  3343.50    49.17     1.82   26.68    0.00   26.68  14.43  98.15
dm-0              0.00     0.00    0.00   18.00     0.00   145.50     8.08     0.42   23.75    0.00   23.75  14.64  26.35
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   40.50     0.00  3202.00    79.06     1.40   34.32    0.00   34.32  23.90  96.80

10/04/2017 04:57:05 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      14.78    0.00    2.40    5.64    0.04   77.14

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.50    0.00   52.50     0.00  1804.00    34.36     1.94   36.85    0.00   36.85  19.04  99.95
dm-0              0.00     0.00    0.00   14.00     0.00    93.50     6.68     0.53   37.64    0.00   37.64  19.14  26.80
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   31.00     0.00  1702.50    54.92     1.43   46.08    0.00   46.08  32.23  99.90

10/04/2017 04:57:07 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      18.33    0.00    2.95   10.79    0.04   67.89

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     1.00    0.00   96.00     0.00  9934.00   103.48     9.06   94.62    0.00   94.62  10.35  99.40
dm-0              0.00     0.00    0.00   16.50     0.00    96.00     5.82     0.65   39.97    0.00   39.97  23.06  38.05
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   72.50     0.00  9860.00   136.00     8.60  118.88    0.00  118.88  13.68  99.15

10/04/2017 04:57:09 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      15.97    0.00    3.83   13.19    0.00   67.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.50    0.00   58.50     0.00  1370.00    23.42     3.10   52.02    0.00   52.02  16.97  99.30
dm-0              0.00     0.00    0.00   22.00     0.00   132.00     6.00     1.28   58.48    0.00   58.48  32.09  70.60
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   31.00     0.00  1220.00    39.35     1.84   57.63    0.00   57.63  31.79  98.55

10/04/2017 04:57:11 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      15.50    0.04    3.29   12.26    0.00   68.91

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     2.50    0.00  101.50     0.00 11687.00   115.14     8.22   81.12    0.00   81.12   9.84  99.85
dm-0              0.00     0.00    0.00   31.00     0.00   212.50     6.85     1.85   59.19    0.00   59.19  24.61  76.30
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00   66.50     0.00 11478.50   172.61     6.73  101.82    0.00  101.82  14.78  98.30

10/04/2017 04:57:13 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
      15.59    0.00    2.27   10.26    0.04   71.84

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00    30.00    0.00  138.00     0.00  3897.50    28.24    15.12  108.36    0.00  108.36   7.11  98.15
dm-0              0.00     0.00    0.00   21.50     0.00   145.50     6.77     0.59   28.16    0.00   28.16  18.26  39.25
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00  137.50     0.00  3846.50    27.97    19.45  140.14    0.00  140.14   7.11  97.80

I do not see anything that stands out, but noticed that you have set indices.store.throttle.max_bytes_per_sec to 100mb. This is generally only recommended if you have SSDs. Could it be that merging activity is causing the slowdown? How does indexing throughput change if you remove this setting?

Do you want me to remove this setting and check?

indices.store.throttle.max_bytes_per_sec to 100mb

The suggested settings helped me... I tested this in one node setup with 8gb + 8gb memory and 16 cpus. Able to achieve 4K messages per second...
I want to understand more

I have removed the below conf
indices.store.throttle.max_bytes_per_sec to 100mb

I had set
index.merge.scheduler.max_thread_count: 1

I increased below conf to 1gb
index.translog.flush_threshold_size

with 1g for index.translog.flush_threshold_size i ran 4k but tried increasing to 6k again i see more io wait and disk utilization is high...

but i tried to increase the above conf to 2g and run 6k i didnt help anything,,, there was a huge lag like 15min lag for one hr data...

If you can explain me whats the impact of keeping index.translog.flush_threshold_size to 1g and if i increase to 2g isn't a good conf?

Any other conf i really look into which will help in increasing performance

I will make all these changes in cluster 3 node setup and come back if i face any issues..

Thank You for support

One more question is in a single node setup if i set num of shards to 3 what is the impact.. in site suggested is 1... initially i had 3 and changed it to 1... will it have any impact on performance i can understand its gonna help in case of cluster mode how abt in single setup???

In single node setup what is the adv of having replica? I dont think its gonna help in any failover case or in case of read distribution... I changed replica's 0 will it have any other impact???

Most default settings in Elasticsearch are quite sensible, so start with the default and measure throughput. Then adjust the setting and measure again. Repeat until no further improvement is seen. Remember that larger values not automatically gives better performance.

In a single node setting replicas do not come into play as Elasticsearch will not place aprimary and replica of a shard on the same node. The ideal number of shards will depend on the use case so you should benchmark.