Update/Upsert Performance Improvements

Hi you all

I'm having data that is very frequently updated, so I use bulk updates (50k documents, ~25MB) to update the data in elasticsearch.

If a document is already present, I use scripted updates (to increase a counter) and if not, I just use the upsert-document.

While this works great on a fresh index (one bulk needs about 15sec), the second bulk (which mostly consists of updates) needs around 3-4 minutes for each bulk.

Elasticsearch is running on a single server (36GB RAM, 20GB Heap Size, 24 cores, dedicated 1GBit/s NIC)

My elasticsearch.yml

<redacted>
## Threading
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

# Bulk pool
threadpool.bulk.type: fixed
threadpool.bulk.size: 60
threadpool.bulk.queue_size: 300

# Index pool
threadpool.index.type: fixed
threadpool.index.size: 20
threadpool.index.queue_size: 100

# Indices settings
indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

index.translog.flush_threshold_ops: 50000
</redacted>

The script looks like this:

{ "script" : "ctx._source.count += %d; ctx._source.touch = timestamp", "params": { "timestamp" : %d }}

Before updating the documents, refresh_interval is set to -1.

I've been monitoring the progress using the great bigdesk plugin and didn't noticed any changes: The threadpool is fine (no queued requests), the GC is not different et cetera.

Do you have any more hints where I could look for this bottleneck? Can I provide further details?

Cheers

From here you'll have to learn how to read java stack traces and identify hot spots from them. There are two tools available to you right now: the hot_threads api and jstack.

hot_threads attempts to guess which threads are causing trouble and gets you a snapshot of them. It works fine when one action is slow but if you have lots of actions that are slow but faster than the hot_threads window then it doesn't work well and you have to use jstack.

jstack you have to run multiple times yourself and do manual thread classification. That isn't has hard as it sounds - I've done it with sed.

Also have a look at the Elasticsearch logs to see if it is logging messages about merges falling behind. If it is then you might want to have a look at merge throttling.

I haven't found any indicators of merges falling behind.. Many thanks for those tools, I'm currently diving into them!

hot_threads is already telling me that elasticsearch is pretty busy with bulking

   17.9% (89.5ms out of 500ms) cpu usage by thread 'elasticsearch[Abe Brown][bulk][T#5]'
 10/10 snapshots sharing following 15 elements
   groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:256)
   groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:245)
   groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:203)
   org.elasticsearch.script.groovy.GroovyScriptEngineService.compile(GroovyScriptEngineService.java:148)
   org.elasticsearch.script.ScriptService.getCompiledScript(ScriptService.java:409)
   org.elasticsearch.script.ScriptService.compile(ScriptService.java:396)
   org.elasticsearch.script.ScriptService.executable(ScriptService.java:518)
   org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:183)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:523)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:239)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:512)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

I'll take further look with jstack

Many thanks!

I think its telling you something is up with groovy. It looks like its doing a lot of compiling maybe you should replace your script with

{ "script" : "ctx._source.count += increment; ctx._source.touch = timestamp", "params": { "timestamp" : %d, "increment": %d }}

BTW the %d makes me think you are building the whole json blob that with string substitution. That is probably safe for things like this but you have to be super careful with escaping. Going with a json building library is probably safer.

1 Like

Wow, it worked out! Many thanks for this one! Those tools are going to be on my toolbelt from now :smile:

Thanks for the hint about the escaping within the json building.

I'm glad it worked for you! What does the performance look like now, btw?

I'm hitting around 50-60seconds per bulk, which suits our needs pretty good.