Poor write performance

Hi.
Last few weeks i have performance troubles with write queries (there is bigger load than earlier). I think, that from app side, there is nothing to optimize - all queries ar sent to _bulk with about 300 operations per request. Most of them are updates and most of them uses update scripts in painless (there is some non-trivial logic). Average duration of one bulk request is about 8-12s, which is terrible. It must be something about 1s (and it earlier was about that time).

All requests goes through RabbitMQ, so I can make some stats about the requests and throttle speed,...

Hardware - we had 3 nodes with enough space, cpu and RAM. (its some virtual servers, but with fast storage) Everything looks good. We tried to add forth node, but without any performance impact. Btw each node has all roles (master, data, ingest,...)

Do you have any suggestions what metrics to watch and how to solve our problem?

I have one idea, but not sure if it can help. I can divide indices into 2 groups, where is almost equal count of writes. Then these indices allocate to different group of nodes (so there will be 2+2 nodes, each containing only one group of indices). Then update the workers (which listens on RabbitMQ queues) to sent requests only to apropriate nodes of cluster.

Splitting to 2 clusters is not possible, because there are also some read requests which needs data from both groups of indices. (read requests are fast enough).

Which version of Elasticsearch are you using? How frequently are you updating the same documents? Is this something that may have changed as load has increased? Do you have monitoring installed so you can check for trends over time, which may help identify what is going on?

6.4.1, JVM: 1.8.0_181

I havent numbers for this, but it happens from time to time. Some of the bulk requests (not all) have set retry_on_conflict=3. My estimate is about 100 conflicts per day

I think, that no. Previously, there was no RabbitMQ queue and queries were sent directly from app (which was terrible to debug). Then I rewrite part of app which stores data to separated workers (RabbitMQ), then it works OK about 1 month and then things get wrong...

Yes, we use Kibana's monitoring, but there is only 1 week of data. So it is useless to compare "before" and "now"

If you are updating the same documents frequently, without a refresh having taken place, this can result in a lot of small refreshes that can affect performance negatively. Do you see a lot of very small segments being generated?

I am not sure if I am looking for the correct data... is it in endpoint _cat/segments?
The output looks like this: https://drive.google.com/open?id=1btH0w8YkoWq69_wVCo_teeTtLs0QPQ51 - I have no idea if its too much or not. Our cluster has currently 65 indices and 593 shards and about 3TB of data.

Btw I talked with Philipp Krenn and he assured me, that you are the right person to solve our problem :slight_smile:

Have you tried capturing and inspecting some of the bulk requests to see if the same documents are repeatedly updated?

I am sure, that in one bulk request, there are no multiple updates of same document. Maybe with different bulk requests in short time, but dont think it can be so offten. But I try to make some check for it and let you know the results.

What does disk usage look like?

If you mean usage as capacity, then there is enough free space (total space: about 1.2TB per node, free space: 340GB/node)
If you mean speed (taken from iostat sdX):
Node:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
node1            3351,08     74368,52     54220,19 78372430496 57139336538
node2            3015,16     59744,43     51363,28 62937946174 54108797165
node3            3245,06     60662,93     53722,02 63793611777 56494499654
node4            3236,35     72683,20     59176,57 63215987610 51468641728

Which looks OK to me (but I have nothing to compare with). Today we also try to move nodes to 3 physical servers (currently it runs in vmware). There will be also more RAM and CPU threads...

What does iostat -x give? What kind of storage do you have?

iostat -x sdX
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
node1               0,04    25,35 3118,88  231,15 74330,23 54222,04    76,75     2,34    0,70    1,24   11,03   0,12  38,74
node2               0,06    21,93 2802,10  213,04 59751,62 51379,14    73,72     3,64    1,21    0,91    5,05   0,11  34,61
node3               0,07    24,26 3022,66  222,85 60677,04 53735,54    70,50     2,00    0,62    1,16   11,59   0,12  39,09
node4               0,43    26,72 2913,43  324,40 72727,95 59190,43    81,49     0,07    0,02    1,46    2,29   0,17  54,48

It is some kind of network storage made of SSD-only drives. Our provider says, that they removed any iops limit on the storage (the new physical servers will have local SSD drives).

I made some logging and it looks, that there are some "conflicts"... I use redis for this... I scan all bulk requests, that will be send to Elasticsearch, then store ID of the document to Redis with 60s expiration (refresh_interval=10s) and then during processing the bulk request, check if there already exists the key and if yes, then increase some counter. And the counter still increases, which means, that there are some "conflicts" (=trying to update same document within 60s)... I will dig more deeply into it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.