Dealing with indexing spikes

jonatanzafar59 · March 27, 2016, 9:11pm

Hi,

We have an application that 3 times a day indexes 100K-200K documents per node in 2 minutes, which causes huge spikes in GC time and pressure on the cluster.

I tried 1MB bulks (will try to raise it), resizing the bulk threadpool, and changing refresh interval to 10s.

Does somebody have another suggestion?

warkolm · March 27, 2016, 10:38pm

Ultimately you need to pay the "cost" of indexing, so other than spreading the load over a longer time, you could add more nodes during the indexing process.

lemon_zmd · March 29, 2016, 6:12am

Better the ingestion can happen smoothly. Say , to make 2 mins to be 20 mins or even consume them from Kafka. But anyway , a profile could help to find the bottle neck. When heavy ingestion happen, cpu memory io can all consumed a lot. Simply you can do vmstat to see the possible bottle neck.
Besides, I think G1 could also be tried. In my test with all default settings when ingest 2m docs to ES-2.2,
G1 totally stop for 3s 318ms (362 times) while cms stop for 8s 24ms (1190 times). So if a lot of full gc happens even concurrent mode failure. G1 could be an option when jdk8 is being used.

jonatanzafar59 · March 31, 2016, 10:09am

Hi,

problem solved. For other users that encounter this problem:

Obviously, the division of the bulks for a longer timeframe helped.
Important note -
We had 4 nodes for the cluster, with 4 CPU cores each. every node holds 10 shards (including replicas).
We added 2 CPU cores for each node, which mitigated the GCs significantly. Important to say, that we didn't see the CPU working so hard, so this came as a surprise.
My guess is that the indexing is per shard, so the CPU had to switch a lot, which delayed the concurrent searches. Alternatively, I guess we could have reindexed the cluster to a significantly smaller amount of shards, and it would have helped too, since the CPU wasn't even close to reaching its limit.

I hope someone can correct/approve my assumption.

Topic		Replies	Views
Search response time doubled erratically Elasticsearch	3	630	July 5, 2017
High CPU Elasticsearch	4	1498	December 5, 2018
Spark Bulk Indexing causes downtime Elasticsearch	2	482	March 4, 2019
Slow Indexing speed / Bottleneck Elasticsearch	6	722	September 16, 2020
Slow ingestion problem (v 6.2.3) Elasticsearch	14	3650	July 22, 2018

Dealing with indexing spikes

Related topics