Hello,
I have indexing issues with elasticsearch 5.1, recently.
In logstash 5.2, I have a lot of errors like that:
[2017-03-06T15:39:12,413][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$6@11b38fce on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4a73e0e4[Running, pool size = 32, active threads = 32, queued tasks = 50, completed tasks = 329787066]]"})
[2017-03-06T15:39:12,414][ERROR][logstash.outputs.elasticsearch] Retrying individual actions
When I increase the number of pipeline workers, nothing changes (from 1 -> 10 or 40).
I don't see any issue with hardware:
top - 15:53:38 up 34 days, 4:08, 3 users, load average: 2.05, 1.63, 1.50
Tasks: 468 total, 2 running, 466 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.3 us, 0.8 sy, 0.0 ni, 94.6 id, 0.2 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32644560 total, 2012804 free, 19934180 used, 10697576 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11892140 avail Mem
iotop shows me that my IO usage is 7%
iostat:
avg-cpu: %user %nice %system %iowait %steal %idle
10.80 0.00 1.41 0.13 0.00 87.67
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 502.00 0.00 8872.00 0 8872
sdb 204.00 0.00 2280.00 0 2280
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 1703.00 0.00 8872.00 0 8872
dm-3 521.00 0.00 2280.00 0 2280
In jvm memory I have only 40% usage.
As I'm not limited by IO, CPU, RAM or JVM Memory, I don't see where is the bottleneck. My latency is ~0.30ms (0.20 when I don't send documents). Furthermore, my index rate has some "holes" every minute/minute and 30s (values at 0 doc/s).
I've read this page (https://www.elastic.co/guide/en/elasticsearch/reference/5.1/tune-for-indexing-speed.html) but I don't see where is the bottleneck.
I could of course increase the queue size in ES, but I've read that it's not a solution for constant throughput, as it only delays the issue and does not fix them.
Do you have any idea how I can troubleshoot that ?
Regards,
Grégoire