Dealing with indexing spikes

Hi,

We have an application that 3 times a day indexes 100K-200K documents per node in 2 minutes, which causes huge spikes in GC time and pressure on the cluster.

I tried 1MB bulks (will try to raise it), resizing the bulk threadpool, and changing refresh interval to 10s.

Does somebody have another suggestion?

1 Like

Ultimately you need to pay the "cost" of indexing, so other than spreading the load over a longer time, you could add more nodes during the indexing process.

Better the ingestion can happen smoothly. Say , to make 2 mins to be 20 mins or even consume them from Kafka. But anyway , a profile could help to find the bottle neck. When heavy ingestion happen, cpu memory io can all consumed a lot. Simply you can do vmstat to see the possible bottle neck.
Besides, I think G1 could also be tried. In my test with all default settings when ingest 2m docs to ES-2.2,
G1 totally stop for 3s 318ms (362 times) while cms stop for 8s 24ms (1190 times). So if a lot of full gc happens even concurrent mode failure. G1 could be an option when jdk8 is being used.

Hi,

problem solved. For other users that encounter this problem:

  • Obviously, the division of the bulks for a longer timeframe helped.
    Important note -
    We had 4 nodes for the cluster, with 4 CPU cores each. every node holds 10 shards (including replicas).
    We added 2 CPU cores for each node, which mitigated the GCs significantly. Important to say, that we didn't see the CPU working so hard, so this came as a surprise.
    My guess is that the indexing is per shard, so the CPU had to switch a lot, which delayed the concurrent searches. Alternatively, I guess we could have reindexed the cluster to a significantly smaller amount of shards, and it would have helped too, since the CPU wasn't even close to reaching its limit.

I hope someone can correct/approve my assumption.