Bulk indexing performance

augusto-altman · January 11, 2017, 8:22pm

Hello everybody,

We have an indexing process that reads data batches from some datasource and index this data on an Elasticsearch cluster using bulk index requests (let's say one bulk per batch). The data in a same batch could go to several indices.

While doing some testing we've noticed an important degradation on the bulk index request performance when the documents in the batch go to several indices. The more mixed the batch is (and therefore the bulk) the more slow the bulk index request is.

Can anyone explain us this behavior?

Thanks in advance,
Best,
Augusto.-

jasontedor · January 12, 2017, 2:49am

This should not be the case. What version? When you say "several indices", can you be more precise (what's an upper bound on the number of indices)?

augusto-altman · January 12, 2017, 2:02pm

Elasticsearch version 2.2. Our cluster has 11 nodes (15gb ram and 8 cpu cores per node). We have around 400 indices.

In some test we did we had documents for every index (400) in a same bulk. Another important detail is that we are performing around 500 bulk index requests per second.

Best,
Augusto.-

Christian_Dahlqvist · January 12, 2017, 2:10pm

What is the size of your bulk requests? Do you have fairly static mappings, or are these updated frequently?

augusto-altman · January 12, 2017, 2:14pm

The bulk requests are using default configs (1k documents per bulk) and our mappings are static. Also we have the index auto-creation property disabled.

jasontedor · January 12, 2017, 2:34pm

I think I know what might be happening. Can you take a few hot thread traces while repeatedly executing bulk requests with payloads targeting many indices, and again targeting one or a small number of indices?

augusto-altman · January 13, 2017, 2:50pm

The main differences we found between the two traces were pieces like this:

[2017-01-12 15:39:37,595][TRACE][index.shard              ] [node-5] [logs_2015-08_performance_0][2] skip writing shard state, has been written before; previous version:  [4] current version [4]
[2017-01-12 15:39:37,595][TRACE][index.shard              ] [node-5] [logs_2015-08_performance_0][26] skip writing shard state, has been written before; previous version:  [4] current version [4]
[2017-01-12 15:39:37,595][TRACE][index.shard              ] [node-5] [logs_2015-08_performance_0][18] skip writing shard state, has been written before; previous version:  [4] current version [4]


.......


[2017-01-12 15:40:00,371][TRACE][transport.tracer         ] [node-5] [2015940][indices:data/write/bulk[s]] sent error response (error: [[node-5][10.75.114.160:9305][indices:data/write/bulk[s][p]]])
[2017-01-12 15:40:00,371][TRACE][transport.tracer         ] [node-5] [2015949][indices:data/write/bulk[s]] received request
[2017-01-12 15:40:00,371][TRACE][transport.tracer         ] [node-5] [1995643][indices:data/write/bulk[s][p]] received response from [{node-5}{IOKuMzJbRjesv8NInGiuBQ}{10.75.114.160}{10.75.114.160:9
305}{ips_id=6D46:6324, master=false}]
[2017-01-12 15:40:00,375][TRACE][indices.breaker.request  ] [request] Adjusted breaker by [16440] bytes, now [16440]
[2017-01-12 15:40:00,375][TRACE][index.store.deletes      ] [node-5][logs_2003-10_performance_0][14] StoreDirectory.deleteFile: delete file _2.fdt
[2017-01-12 15:40:00,375][TRACE][transport.tracer         ] [node-5] [1995649][indices:data/write/bulk[s][p]] received response from [{node-5}{IOKuMzJbRjesv8NInGiuBQ}{10.75.114.160}{10.75.114.160:9
305}{ips_id=6D46:6324, master=false}]

.......


2017-01-12 15:40:00,416][TRACE][indices.breaker.request  ] [request] Adjusted breaker by [-16440] bytes, now [0]
[2017-01-12 15:40:00,417][TRACE][transport.tracer         ] [node-5] [1994226][indices:data/write/bulk[s][p]] received response from [{node-5}{IOKuMzJbRjesv8NInGiuBQ}{10.75.114.160}{10.75.114.160:9
305}{ips_id=6D46:6324, master=false}]
[2017-01-12 15:40:00,417][TRACE][indices.breaker.request  ] [request] Adjusted breaker by [16440] bytes, now [16440]
[2017-01-12 15:40:00,417][TRACE][transport.tracer         ] [node-5] [2004520][indices:data/write/bulk[s]] sent response
[2017-01-12 15:40:00,417][TRACE][indices.breaker.request  ] [request] Adjusted breaker by [-16440] bytes, now [0]
[2017-01-12 15:40:00,417][TRACE][transport.tracer         ] [node-5] [1993906][indices:data/write/bulk[s][p]] received response from [{node-5}{IOKuMzJbRjesv8NInGiuBQ}{10.75.114.160}{10.75.114.160:9
305}{ips_id=6D46:6324, master=false}]
[2017-01-12 15:40:00,417][TRACE][indices.breaker.request  ] [request] Adjusted breaker by [16440] bytes, now [16440]
[2017-01-12 15:40:00,417][TRACE][transport.tracer         ] [node-5] [2001951][indices:data/write/bulk[s]] sent response
[2017-01-12 15:40:00,418][DEBUG][indices.memory           ] [node-5] recalculating shard indexing buffer, total is [1.4gb] with [195] active shards, each shard set to indexing=[7.7mb], translog=[64
kb]
[2017-01-12 15:40:00,418][DEBUG][index.shard              ] [node-5] [logs_2000-12_performance_0][2] updating index_buffer_size from [7.8mb] to [7.7mb]; IndexWriter now using [427204] bytes
[2017-01-12 15:40:00,418][DEBUG][index.shard              ] [node-5] [logs_2000-12_performance_0][18] updating index_buffer_size from [7.8mb] to [7.7mb]; IndexWriter now using [431760] bytes
[2017-01-12 15:40:00,418][DEBUG][index.shard              ] [node-5] [logs_2000-12_performance_0][26] updating index_buffer_size from [7.8mb] to [7.7mb]; IndexWriter now using [431760] bytes
[2017-01-12 15:40:00,418][DEBUG][index.shard              ] [node-5] [logs_2000-12_performance_0][10] updating index_buffer_size from [7.8mb] to [7.7mb]; IndexWriter now using [430140] bytes
[2017-01-12 15:40:00,419][DEBUG][index.shard              ] [node-5] [logs_2012-05_performance_0][24] updating index_buffer_size from [7.8mb] to [7.7mb]; IndexWriter now using [425584] bytes
[2017-01-12 15:40:00,419][TRACE][index.engine.lucene.iw   ] [node-5] [logs_1998-01_performance_0][16] elasticsearch[node-5][refresh][T#7] DW: publishFlushedSegment seg-private updates=null
[2017-01-12 15:40:00,419][TRACE][index.engine.lucene.iw   ] [node-5] [logs_1998-01_performance_0][16] elasticsearch[node-5][refresh][T#7] IW: publishFlushedSegment

jasontedor · January 13, 2017, 7:30pm

Okay, that was not what I'm looking for, but it still gives me information to conclude that it was what I suspected.

I was looking for API responses from the hot threads API. If you can still provide this and simultaneously turn on trace logging for org.elasticsearch.indices.IndexingMemoryController (as this is exactly where my suspicions are), we can nail this down even tighter. It's not necessary though, I think we already have it.

Here's what I think is happening: there is a memory budget that is shared among all index writers for all active shards. If you're writing to 400 indices, that budget is shared across the active shards on these 400 indices. I don't know how many shards you have, but the default budget is 10% of the heap and if you divide that across 400 shards (assuming 1 shard per index) or 2000 shards (assuming the default of 5 shards per index), you are left with a very small budget per shard. These means that the shards are constantly flushing, and this is terrible for performance.

You either need to change your write patterns (so that not all shards are active at the same time) or increase the memory budget (via indices.memory.index_buffer_size; default is 10% of the heap).

I hope that helps.

augusto-altman · January 13, 2017, 7:41pm

Wow thanks! We will try that and keep you posted. Just FYI we have 30 shards per index.

jasontedor · January 13, 2017, 9:57pm

Yeah, so I do think my explanation has pretty much covered your situation.

I do think you probably have too many shards per index. Cutting that down will also help increase the budget per active shard.

system · February 10, 2017, 9:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch poor indexing performance Elasticsearch	6	852	December 1, 2017
Indexing rate performance in cluster Elasticsearch	6	3763	July 5, 2017
Bulk Indexing Rate Elasticsearch	4	562	April 18, 2018
Bulk Insert Throughput Issues Elasticsearch	2	314	July 6, 2017
Slow bulk indexing with lots of different 'types' Elasticsearch	7	795	July 5, 2017

Bulk indexing performance

Related topics