Cluster details: 7 node x r4.2xlarge (8 vCPU, 61 GB - 32 assigned to ES) on AWS
I have 650 GiB volume per node so around 4.5 TB disk space.
Index size: 1.7b documents / 1.8 TB / 28 Shards, 1 replica
We had to some heavy ingestions for historical data, so until now we set (replicas=0, refresh_interval=-1) to help bulk ingestions. Our one week data, which is around 50 GB, was taking less than an hour to ingest. We then changed the settings to (replicas=1, refresh_interval=-1), now its taking 11 hours to ingest same data! Am I missing something fundamental with replicas or is it possible to have them as read only?
I thought of index.translog.durability=async but doesn't look like right thing to do since it might lead to loss of data in case a node crashes.
Appreciate any help.
What version, what OS, what JVM?
Are you sure it's 32GB of heap?
Sorry I missed the version information:
30GB per node (not 32GB), 210GB for the cluster
Linux Red Hat 4.8.3-9 4.4.15-25.57.amzn1.x86_64
Are you assigning document IDs at the application level? Are you indexing into all these indices/shards? How large percentage of operations are updates? What type of EBS are you using? What does
iostat show while you are indexing?
Are you assigning document IDs at the application level?
Yes. We are unfortunately using parent child where parent is user and child are events.
Are you indexing into all these indices/shards?
Yes indexing to all the Shards (every one has 4 primary and 4 replica shards)
How large percentage of operations are updates? What type of EBS are you using? What does iostat show while you are indexing?
Users (parent) can be repeated (40-50%) but events (child) are not normally. Using provisioned with 1300 iops.
Linux 4.4.15-25.57.amzn1.x86_64 (ip-xxx-xx-xx-xx) 31/07/18 x86_64 (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
4.70 0.00 0.13 1.62 0.01 93.53
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvda 132.44 2438.90 3037.69 21919907991 27301623912
Digging further, I realise there are two scenarios and I have some hypothesis around it:
- With replication, its reaching watermark. Ingesting data in that state, am guessing if that triggers relocation that can make ingestion slow. I see in logs:
ShardStateAction] [es6-2] [ad-events-index] received shard failed for shard id [ad-events-index]], allocation id [x4BYctXfS1GAvxxxxxx], primary term , message [mark copy as stale]
[WARN ][o.e.c.a.s.ShardStateAction] [es6-2] [ad-events-index received shard failed for shard id [[ad-events-index]], allocation id [NCuJipfyTxxAxq9TwZ_rBQ], primary term , message [mark copy as stale]
But in shard history I do not see any relocation.
I will certainly retry with more disk space but will be great if someone can help me validate if this could be a problem.
- Now if I remove replication, and ingestion is still taking long time (8-10 hours) probably because it is still trying to recover. Same ingestion job after two days (cool down period) of removing replica is taking just 1.5 hours, 5 times faster. I saw some logs like:
[DEBUG][o.e.a.b.TransportShardBulkAction] [ad-events-index] failed to execute bulk item (update) BulkShardRequest [[ad-events-index]] containing  requests
org.elasticsearch.index.shard.IllegalIndexShardStateException: CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]
I am not sure what is elasticsearch doing when we remove replication, apart from making disk space. Again, I don't see any relocation for this index in shard history but I do see 90-95% memory usage in some nodes.
Its not recommended to use index.translog.durability=async, but if we do, would it help anything. Our ingestion jobs would be faster but elasticsearch would still have to do same amount of work to replicate it fully.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.