So I've got a cluster of 3xMASTER | 4x HOT | 4x WARM nodes. The goal of my project is to store indices on HOT nodes for about 1-2 weeks and then allocate them (with the help of Curator) on WARM nodes.
We are talking about huge amounts of data and my company came with a bright idea of using ECS with S3 API for it. So we are mounting (we have to) S3 as a filesystem on RedHat, let's say on /data/, using S3FS and then setting Elasticsearch's path.data: /data/node_1/ or /node_2/ and so on. It's super slow of course but it doesn't blow up so that's something positive.
Cerebro / Shard statistic are saying that 100% of shard was sent to WARM but S3FS just cached this data locally on /app/, it is right now sending it to S3 mounted on /data/ as fast as it can but is slow as hell of course.
Problem is: when I tried to allocate a 100 GB index (4 x 25 GB shards) on WARM nodes it seems that Elasticsearch is hitting those 100% and after a few minutes (S3FS is still sending from it's cache od /app/ to S3 on /data) it starts to reallocate those shards all over again but in different shard-node setup.
For example I'm starting the allocation process and shards are aligned like this:
Allocation hits 100%, shard are still green on HOT and purple on WARM, S3FS still has like 15GB of each shard to send to /data/ but Elasticsearch start's to reallocate shard all over again and now it looks like this:
Whole process begins anew, S3FS can't send it fast enough so after hitting 100% it rerolls all over again until eternity.
Is there a possible way to force elasticsearch into "waiting" until those shard are completely sent to S3 by S3FS ? Maybe some timeout set to like 60 minutes or something? I tried to find something in Docs but sadly nothing helped me.
Also I have those settings:
cluster.routing.allocation.enable: "all" --> We're not using replicas right now so may as well set it to "primaries".