Handling high task count on daily index rollover

Some stats about the cluster
Version 6.5.0
3 master nodes
12 hot data+ingest nodes
16 vCPU 32GB mem (16GB heap) each
12 warm data+ingest nodes
8 vCPU 32GB mem (16GB heap) each
1 ingest node (added recently for testing)

Ingestion is done with a ALB that routes to all the non-master nodes (warm+hot+client)
data is sent from a flink job using the java ES client + filebeat pods

indices have no static mappings and dynamic mapping is used
Some indices have pretty complex structure but nothing too nested or exotic

Hot nodes hold data 3-4 days back and warm nodes hold all the rest of the data (depends on index that can be 30 days back)
Average number of shards per hot node is 40 with total size of 1.3TB (1.5B docs)

Since we reduces the amount of shards per index significantly we have no issues during the day ingesting, indexing and querying data.
However, every day at 00:00 when our filebeat pods write to the new indexes, the task queue completely fills up.
It takes about 5 hours to stop seeing rejections completely and i know (but did not measure) that it takes some time for the new indices to show up

How can this be improved? i guess that upgrading to a new version can help and we are planning to do so but we want to find a solution in the meantime.
I believe the amount of index/shards per node is completely within recommended limits. is it a CPU intensive operation? memory intensive?

I relatively simple way to get around this would be to switch to the rollover API so that indices roll over based on time and/ size. This will make the various indices likely to roll over at different times which would spread out the work.

For any other suggestions it would help to know how many daily indices/shards you are indexing into and how many shards and indices you have in the cluster as a whole as this drives the size of the cluster state.

  1. Lets say today is 1/5. If we create the indices for 1/6 during the day, it will still need to create the dynamic mapping which is what i would guess the more time consuming task, isn't it?
  2. We are still considering using the size rollover option.
  3. The daily stats are, on average:
    indices - 38
    shards - 160 (we aim for about 50GB per shard)
    doc count - 4.7B
    storage size - 4.6TB (1 replica for each index)

That'd be my guess too. Dynamic mappings are useful for getting started but they're expensive since they involve the master in indexing which can be a bottleneck as there's only one master. The simple solution is to avoid them and define your mappings up-front as far as possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.