Bulk update is too slow elasticsearch 6.2

ashishtiwari1993 · April 24, 2018, 5:01am

Hi guys,

Hers is my configuration:

ES version = 6.2
JVM = 30gb
Ram = 128gb
CPU = 24core
SDK = PHP

I am experiencing Bulk update is too slow. I tried given solution which not working. Any suggestion will be appreciable Thanks.

Christian_Dahlqvist · April 24, 2018, 5:40am

How are you updating the documents? What performance are you seeing? What is the size and structure of your documents? How frequently is an individual document updated?

ashishtiwari1993 · April 24, 2018, 5:46am

I am doing bulk updating of 500 documents. Per document size is 32 kb approx. I am having queue system. So i wrote daemon which continuous pop the queue in batch of 500 and updating to ES.

Christian_Dahlqvist · April 24, 2018, 5:49am

How many parallel update threads are you running? Are you using nested documents or parent-child? How many updates do you get per second?

ashishtiwari1993 · April 24, 2018, 5:54am

20 php child process is running. There is some nested doc and main doc too which being update. I am also using script. Mostly updates are using 'doc_as_upsert'. If the batch size is 50 its hitting 4 bulk api request/sec. Which means 50*4 = 200 doc updating / sec. I would like to update approx 5k/persec.

Christian_Dahlqvist · April 24, 2018, 5:56am

What does CPU usage, disk I/O and iowait look like on the Elasticsearch host? How many indices and shards are you actively updating?

ashishtiwari1993 · April 24, 2018, 6:02am

with 'iostat' its showing:
Linux 3.10.0-693.11.1.el7.centos.plus.x86_64
avg-cpu: %user %nice %system %iowait %steal %idle
2.71 0.48 0.43 0.02 0.00 96.37

Christian_Dahlqvist · April 24, 2018, 6:18am

Do you see anything in the Elasticsearch logs, e.g. around slow GC or merging falling behind?

ashishtiwari1993 · April 24, 2018, 6:23am

How frequently is an individual document updated? => two update every 2 sec/doc
How many indices and shards are you actively updating? => 30 indices with 10 shards, 1replica
I am not seeing any log related to gc and merging. Should i increase the threads ? as the cpu idle is 96%. Or is their any other conf?

Christian_Dahlqvist · April 24, 2018, 6:29am

If you update a document that is still present in the transaction log and has not yet been written to a segment, this will trigger a refresh. If you're frequently updating the same documents, this will hurt performance as a refresh is an expensive operation.

How many nodes do you have in the cluster? How much data do you have in total?

ashishtiwari1993 · April 24, 2018, 6:32am

Yes i tried by disable/increase/decrease refresh_internal = 0/-1/30 but not got any success. I am having single node/cluster architecture . Right now i am having approx 10 million data which keep increasing.

Christian_Dahlqvist · April 24, 2018, 6:39am

As you are updating documents quite frequently, I do not necessarily think a longer refresh interval will help at all. It may actually be better to leave it at the default 1 second.

ashishtiwari1993 · April 24, 2018, 6:41am

hmm .. okay So by adding node OR increasing threads will solve this problem ? OR is their any imp conf which i am missing ?

Christian_Dahlqvist · April 24, 2018, 6:43am

I do not have a lot of experience of update intensive use cases, so am not sure how to best optimise this. Let's see if someone else chimes in.

ashishtiwari1993 · April 24, 2018, 12:16pm

Hi i just noticed in my log In which i got 20 child process simultaneously send 4 bulk request/sec and 1 process also send 4 bulk req/sec. I think ES is not handling multiple connection for bulk . Is their any conf/setting for bulk api ?

loren · April 24, 2018, 5:26pm

Man, I do, and it wasn't a good experience!

If you are updating the same document multiple times per second, and you have many threads of execution updating documents, you will bury Elasticsearch. Occasional bursts of updates: fine. Constant low-frequency updates: fine. Constant high-frequency updates: not fine. And it gets less fine the larger the doc size.

I would suggest using a different approach to handle the frequent updates. Perhaps an in-memory cache that periodically (i.e., once every minute or two) updates a record in ES.

ashishtiwari1993 · April 25, 2018, 4:48am

Thanks for reply loren. So is their any way where i can update in realtime. I have 2k to 3k per sec. How i can handle this in realtime OR it is not good practice to push data in realtime into elasticsearch ?

loren · April 25, 2018, 5:29am

You can index tons of data into Elasticsearch very rapidly. Just not rapid updates, in my experience.

ashishtiwari1993 · April 26, 2018, 4:19pm

Hi i guys, I debug and found the cause. I was heavily using script condition as well as update with upsert. So i just remove script condition and kept update query with upsert & got performance. But still m facing version control by running multiple threads . Any idea guys how i can tackle this ?

ashishtiwari1993 · May 7, 2018, 7:56am

Hi guys, I got solution which helped me to increase my performance . You can check here https://gist.github.com/ashishtiwari1993/004a19f4a44efc214403a7fc1ee27cda#challenge-1-

Topic		Replies	Views
Elasticsearch bulk update is extremely slow Elasticsearch	11	11699	April 10, 2017
Es update document very slow Elasticsearch	11	2382	January 18, 2021
Update/Upsert Performance Improvements Elasticsearch	8	9488	July 5, 2017
Slow Bulk Updates on 6.2.3 Elasticsearch	3	910	May 8, 2018
Bulk Indexing performance on AWS ES service Elasticsearch	13	2042	November 29, 2017

Bulk update is too slow elasticsearch 6.2

Related topics