Bulk updates are extremely slow after upgrading to 5.2

My average doc size is around 1KB. I do use scripted updates but only with update_by_query and i do not see anything alarming about those as all my other thread polls seems to work blazingly fast.

Initially i was trying with a single thread now i am running 10 threads. Performance improved but still the requests take a long time to process.

I will try increasing the bulk size and let you know.

Is there perhaps a single monster-size doc that keeps growing?
I remember the story of a user with a large (>1GB !?) document that was continually being added to.

No all documents are relatively the same size. I do have very few large documents but those are never updated

Can you perhaps show us exactly what you are doing? What does one of your requests look like? Do you have monitoring installed so you can share stats on indexing etc?

Here is a sample bulk fragment, note all other update requests look like this:

{
	"update" : {
		"_id" : "C5DBE5CD49E1425AABC146CDAEE7F3AF",
		"_index" : "backup_index",
		"_type" : "documents"
	}
}{
	"doc" : {
		"cont_length" : 0,
		"content_ukw2" : [""],
		"language_description2" : 0,
		"pagecount" : 1,
		"wordcount" : 0,
		"language" : 0,
		"doctype" : 3,
		"resolution" : 0,
		"thumb_place_x" : 90,
		"thumb_place_y" : 90,
		"image_orientation" : 0,
		"frames_count" : 4,
		"processed" : true
	}
}{
	"update" : {
		"_id" : "E566319089024F39A53F74E4E1AC6036",
		"_index" : "backup_index",
		"_type" : "documents"
	}
}{
	"doc" : {
		"cont_length" : 0,
		"content_en" : [""],
		"language_description2" : 0,
		"pagecount" : 1,
		"wordcount" : 0,
		"title" : "Time Matters",
		"title_sort_ducet" : "Time Matters",
		"language" : 5,
		"doctype" : 3,
		"resolution" : 0,
		"thumb_place_x" : 90,
		"thumb_place_y" : 90,
		"image_orientation" : 0,
		"frames_count" : 4,
		"processed" : true
	}
}

I am now sending 10000 requests per bulk and the tasks show requests running for over an hour without completing not sure I can wait any longer.

I don't have monitoring enabled. Will install it now and check.

This is the current state of my cluster. Do you see any abnormalities?

I have fiddled with the number of items per bulk request and the number of threads running. Now I am managing about 8000 update requests per 20 minutes. It is still very slow. I do not see the delays in the monitoring graphs.

I converted my code to do GET/INSERT instead of using Update or Bulk Update. I managed to update 1M documents in 10 minutes running 500 threads. This includes pulling the document, updating the fields and inserting it back to ES.
My Cluster was running mighty fine during this process and it would definitely accept more inserts if I wished to increase the number of threads. No issues with CPU, Disk or Memory

This leads me to believe that there is something really wrong with the way ES Update works.

This is my latest index info. Since I change the code everything is working smoothly and updating the documents only takes milliseconds. The red line indicates the time when I updated the code to use a get/insert combination instead of bulk updates.

I did not notice any changes in the index rate and I am not sure if ES counts updates with this rate as it should increase drastically.

I believe there is an issue with bulk updates and hope it gets fixed soon.

Have you done any profiling of this situation? For example, have you used the hot threads API to see what the shards are doing when executing the bulk requests? Have you attached a profiler to understand where the shard is spending it time when executing the bulk request? You should do this on a node holding a shard executing the bulk request, not the coordinating node receiving the bulk request.

Unfortunately I have migrated the code and since the issue is happening on my production cluster I cannot revert back to using bulk for testing purposes. I would have loved to have known about these sooner so I might help fix the issue.

Hey All,
The same problem:
Elasticsearch 5.2 with x-pack
AWS EC2 2 x i3.2xlarge RAM 61GB (31GB heap ), SSD
Ubuntu 16.04
A few indexes with 64 shards and 1 replica (360GB indexes size)
Too slow _bulk update with high CPU
_bulk indexing is fast

Example:

POST _bulk
{"update":{"_index":"cc_3","_type":"job","_id":"2124_cca74860dae5c0f7832d846823873808_228_i"}}
{"doc":{"additional_fields":null}}
{"update":{"_index":"cc_3","_type":"job","_id":"2124_cca74860dae5c0f7832d846823873808_228_i"}}
{"doc":{"additional_fields":{"ats":"none"}}}

Response took 300 - 600

Index document with _bulk took 5-6

On Elasticsearch 2.4.1 All works fine

https://www.evernote.com/l/ApaWh-26Q5FA1JS1frd1ezxkHhdhjRnrWFI

https://www.evernote.com/l/ApZiq2B2IdROzJtTnAWH23y_8-1TGspyBj8

Likewise, same problem with ES 5.2 - bulk updates are extremely slow (compared to bulk indexing), we had to apply client-side workaround to reduce the number of updates we are making since ES was simply not keeping up.

Create in cloud test index for reproduce it:
https://discuss.elastic.co/t/cluster-e22dce-bulk-updates-are-extremely-slow/82181?source_topic_id=79450

Hot threads

   21.9% (109.5ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000001][bulk][T#2]'
     9/10 snapshots sharing following 73 elements
       sun.misc.Unsafe.park(Native Method)
       java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
       java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
       java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
       java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
       java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer.lambda$warmReader$2(IndexWarmer.java:165)
       org.elasticsearch.index.IndexWarmer$FieldDataWarmer$$Lambda$1824/1724080657.awaitTermination(Unknown Source)
       org.elasticsearch.index.IndexWarmer.warm(IndexWarmer.java:85)
       org.elasticsearch.index.IndexService.lambda$createShard$2(IndexService.java:345)
       org.elasticsearch.index.IndexService$$Lambda$1805/821728431.warm(Unknown Source)
       org.elasticsearch.index.engine.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:1226)
       org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:198)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:160)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:707)
       org.elasticsearch.index.engine.InternalEngine.get(InternalEngine.java:344)
       org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:611)
       org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:165)
       org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:83)
       org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:78)
       org.elasticsearch.action.bulk.TransportShardBulkAction.executeUpdateRequest(TransportShardBulkAction.java:269)
       org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:159)
       org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:113)
       org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69)
       org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939)
       org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908)
       org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113)
       org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:322)
       org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:264)
       org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:888)
       org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:885)
       org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147)
       org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1654)
       org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:897)
       org.elasticsearch.action.support.replication.TransportReplicationAction.access$400(TransportReplicationAction.java:93)
       org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:281)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:260)
       org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:252)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:235)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:109)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:284)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$$Lambda$1597/1663975799.accept(Unknown Source)
       org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$null$2(ServerTransportFilter.java:164)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile$$Lambda$1617/1213880001.accept(Unknown Source)
       org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.maybeRun(AuthorizationUtils.java:127)
       org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.setRunAsRoles(AuthorizationUtils.java:121)

Can we get someone from Elastic to look into this? It's a major problem that has been reported by multiple users, not only in this thread but others as well (Elasticsearch bulk update is extremely slow) and so far without any traction.

Glad I am not the only one who has reported on this. Well at least now I know that I was not doing anything wrong with bulk updates.

I honestly don't see why Elastic is ignoring this. I know this is probably something that is not going to be easy to fix as an incremental update to 5.x but at least put it on the roadmap for 6.x

The reason why it's slow is probably the following:

As of 5.0.0 the get API will issue a refresh if the requested document has been changed since the last refresh but the change hasn’t been refreshed yet. This will also make all other changes visible immediately. This can have an impact on performance if the same document is updated very frequently using a read modify update pattern since it might create many small segments. This behavior can be disabled by passing realtime=false to the get request.

An update operation is effectively a GET + INSERT.

The solution would be not to frequently update the same documents again and again. If that's a necessity, you should batch these updates on the application level.

It is unlikely this is the cause - fetching the document over the network (GET), modifying it and reindexing it (INSERT) is still faster than update by several orders of magnitude.

Multiple update of one document in one _bulk too slow There we also discuss it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.