I am having an issue after upgrading my cluster from ES 1.7 to 5.2. The reindexing and upgrade were done two days ago and I did not have an issue on 1.7 before upgrading.
The bulk update tasks are taking a long time to finish, sometimes it even reaches an hour. My cluster consists of 5 nodes running on AWS EC2 with 50 shards and 1 replica. All my EBS volumes are IOPS provisioned with 10000 IOPS.
The EC2 instances are m4.2xlarge with 8 vCPUs and 32GB of ram. The heap size is set to 15GB.
I am closely monitoring the nodes and I do not see anything that should cause the slowness. Searching is blazing fast, CPU and memory usage are well within the accepted ranges and the used heap percentage is around the 65% mark. CPU is around 20% average.
Iostat shows very normal behaviour:
Linux 3.13.0-107-generic (ip-10-0-5-168) 03/21/2017 x86_64 (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
29.28 0.00 0.26 0.59 0.07 69.80
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvdf 222.49 4962.23 3983.79 88110729 70737216
And this is the output for _cat/nodes:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.0.5.16 42 94 6 1.04 1.16 0.61 mdi - NODE_01
10.0.4.74 53 99 37 6.37 6.36 5.92 mdi - NODE_04
10.0.5.60 68 99 21 4.11 5.00 4.18 mdi - NODE_02
10.0.5.168 69 99 31 4.99 5.14 4.57 mdi * NODE_03
10.0.4.161 72 99 13 1.08 1.70 1.45 mdi - NODE_05
And this is a segment of the tasks API response:
Each bulk call consists of up to 100 update requests. Each request is about 300 bytes. and I have 10 threads running and sending bulk requests.
I have set the index refresh interval to -1 but there was no noticeable difference.
I do not understand why I am having this issue. What might be the cause of this. And how can I speed things up?