Slow Bulk Updates on 6.2.3

ctrix · March 25, 2018, 8:36pm

Hello all,

i'm following up on this comment on github and the following suggestion.

I have a cluster of 8 blades, 32G RAM and Xeon Processors.

I receive, from a kafka topic, a lot of documents that need to be inserted (or updated, if they're already existing) into an index that is rotated weekly. I use a custom document ID for this purpose.
The documents are DEDUPED on a 24h time window. This means that i receive, for each document, at most a single update every day.

The document rate is quite high and at the beginning of the week i can handle around 10K bulk updates per second. This load is handled quite easily by the cluster, even if i use a HOT-WARM approach, dedicating 3 blades to the hot weekly index.
Even using the Bulk Update API, this pattern collapses into a Bulk INSERT because each document is not existing in the index.

After exactly 24 hours, that is as soon as i start receiving new data for existing documents, the insert rate drops to 400/500 queries per second, rendering the cluster useless and unsuitable for the purpose. Also, the CPU skyrockets and the iowait on the indexing blades goes very high.

I have tried to modify the number of shards, the number of nodes dedicated to the hot indexing, trying to forcemerge the index before inserting more documents but nothing seems to approach the performance that i need.

I don't really know how to address this problem which seems to be an issue since ES 5.$something.

How can i address this issue ?

I have read that i may try to use index instead of update, prepending a get for each codument that i need to update.

Do any of you experts have some suggestions ?

thanks for your time!

Zaid_Amir · March 26, 2018, 1:40pm

Wow, I have had this same exact problem in 5.0. Seems ES did not fix it yet which is kinda expected since ES is not really designed to be updated frequently.

Yes it seems to be the only way is to stop using bulk updates and revert to using GET/INSERT, thats what I did when I first updated to 5.2 and its still working fine.

bleskes · April 10, 2018, 12:57pm

@ctrix can you share the output of the hot threads api while the updates are going on? I want to see what the primary is doing and what takes the time.

system · May 8, 2018, 12:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow bulk api requests (ES 6.0 beta2) Elasticsearch	1	987	October 16, 2017
Bulk update is too slow elasticsearch 6.2 Elasticsearch	25	6828	June 4, 2018
Elasticsearch bulk slows down after a certain amount of documents Elasticsearch	4	1288	April 24, 2020
Elasticsearch bulk update is extremely slow Elasticsearch	11	11699	April 10, 2017
Initial Upload ElasticSearch 6.3 Bulk insert slows to a crawl Elasticsearch	3	787	September 9, 2018

Slow Bulk Updates on 6.2.3

Related topics